Environment Set Up¶
Code and Data Download¶
Prerequisites¶
- Python 3 or 3.11.5 installed on your system
- pip package manager
Create Virtual Environment¶
Step 1: Create Virtual Environment¶
python3.11 -m venv hypoxia_env
Step 2: Activate Virtual Environment¶
Windows:
hypoxia_env\Scripts\activate
Mac/Linux:
source hypoxia_env/bin/activate
Step 3: Verify Python Version¶
python --version
Should show: Python 3.11.5
Step 4: Upgrade pip¶
python -m pip install --upgrade pip
Step 5: Install Required Packages¶
pip install pandas numpy scikit-learn matplotlib seaborn jupyter scipy xarray rasterio ee geemap
Step 6: Optional Geospatial Packages (uncomment in code if needed)¶
pip install geopandas gdal arcpy
Step 7: Verify Installation¶
python -c "import pandas, numpy, xarray, rasterio, ee, geemap; print('All packages installed successfully!')"
Table of Contents¶
Hypoxia Prediction in the Gulf of Mexico Using Machine Learning¶
Overview¶
This project develops an end-to-end machine learning pipeline to predict hypoxic zones (dissolved oxygen < 2 mg/L) in the Gulf of Mexico using satellite-derived ocean color data from MODIS-Aqua and in-situ dissolved oxygen measurements from GCOOS (Gulf of Mexico Coastal Ocean Observing System). The pipeline includes data integration, model training, uncertainty quantification, spatial prediction on raster imagery, and temporal trend analysis.
Objectives¶
- Primary Goal: Build robust binary classifiers to predict hypoxia occurrence from 14 satellite-derived features (chlorophyll-a, sea surface temperature, remote sensing reflectance bands)
- Spatial Mapping: Generate georeferenced prediction rasters with uncertainty quantification for spatial extent analysis
- Temporal Analysis: Quantify hypoxia zone area over time (2011-2020) to identify seasonal patterns and long-term trends
- Model Interpretation: Provide transparent model insights through feature importance, SHAP values, and error analysis
Data Sources¶
In-Situ Measurements¶
- Source: GCOOS dissolved oxygen observations (1970-present)
- Format: CSV with location, timestamp, DO concentration (mg/L)
- Processing: Temporal aggregation, outlier filtering, binary hypoxia classification (DO < 2 mg/L)
Satellite Imagery¶
- Source: MODIS-Aqua monthly Level-3 composites (Google Earth Engine)
- Resolution: ~4 km spatial resolution
- Features (14 bands):
- Chlorophyll-a concentration (
chlor_a) - Normalized Fluorescence Line Height (
nflh) - Particulate Organic Carbon (
poc) - Sea Surface Temperature (
sst) - Remote Sensing Reflectance at 412, 443, 469, 488, 531, 547, 555, 645, 667, 678 nm
- Chlorophyll-a concentration (
Coordinate Reference Systems¶
- Input Data: WGS84 (EPSG:4326) geographic coordinates
- Area Calculations: EPSG:3814 (State Plane, Gulf of Mexico) for accurate spatial quantification
Methodology¶
1. Data Loading & Exploration¶
- Load GCOOS dissolved oxygen measurements
- Visualize temporal trends of hypoxia events
- Compute decadal statistics and annual event counts
2. Data Integration¶
- Merge multi-year satellite feature CSVs (
train_*.csv) - Align in-situ DO measurements with satellite pixels
- Remove missing values and prepare balanced training sets
3. Model Development¶
Classifiers Trained (8 algorithms):
- Random Forest (baseline + hyperparameter-tuned)
- Logistic Regression
- Support Vector Classifier (SVC)
- K-Nearest Neighbors (KNN)
- Decision Tree
- Gaussian Naive Bayes
- XGBoost
- Multi-Layer Perceptron (MLP)
Training Strategy:
- Train/test split: 70%/30% with stratification
- Class balancing: Downsample majority class to match minority
- Preprocessing: MinMaxScaler for scaled models (LR, SVC, KNN, MLP)
- Cross-validation: 5-fold stratified k-fold for robustness metrics
Hyperparameter Tuning:
- Method: RandomizedSearchCV (12 iterations, 3-fold CV)
- Optimized for: F1 score (balances precision/recall for imbalanced data)
- Parameters tuned: n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features
4. Model Interpretation & Uncertainty Quantification¶
- Performance Metrics: Accuracy, balanced accuracy, precision, recall, F1, ROC-AUC
- Confusion Matrix & ROC/PR Curves: Evaluate classification quality
- Bootstrap Confidence Intervals: 200 resamples for metric uncertainty (95% CI)
- Probability Calibration: CalibratedClassifierCV with Platt scaling (sigmoid, 3-fold CV)
- Feature Importance: Tree-based importances, SHAP values, permutation importance
- Partial Dependence Plots (PDP/ICE): Visualize feature-target relationships
- Error Analysis: Compare false negatives vs. true positives; identify high-confidence errors
5. Spatial Prediction on Rasters¶
- Load multi-band MODIS GeoTIFFs (14 bands per image)
- Apply trained model to generate binary predictions (hypoxic/non-hypoxic)
- Generate uncertainty rasters (calibrated probabilities scaled to uint8 0-255)
- Preserve geospatial metadata (CRS, geotransform) for GIS compatibility
- Outputs:
- Binary prediction rasters:
*_hypoxia_pred.tif - Uncertainty rasters:
*_hypoxia_uncertainty.tif
- Binary prediction rasters:
6. Spatial Quantification & Temporal Analysis¶
- Reproject prediction rasters from WGS84 to EPSG:3814 for accurate area calculations
- Count pixels per class, convert to km² using projected resolution
- Export time series:
area.csvwith columns {file, class_0_area_sqkm, class_1_area_sqkm} - Temporal Visualizations:
- Monthly boxplots of hypoxia extent
- Seasonal violin plots (Spring/Summer/Winter)
- Time-series line plots with trend lines and mean baselines
- Annual average hypoxia intensity maps (2011-2020)
Key Features¶
✅ Comprehensive Model Comparison: Evaluate 8 classifiers with standardized metrics
✅ Hyperparameter Optimization: RandomizedSearchCV for efficient tuning
✅ Uncertainty Quantification: Bootstrap CIs + calibrated probabilities
✅ Model Transparency: SHAP, permutation importance, PDP/ICE plots
✅ Production-Ready Inference: Scalable raster prediction pipeline
✅ Geospatial Accuracy: Projected CRS for precise area quantification
✅ Temporal Trend Analysis: Multi-scale visualization (monthly/seasonal/annual)
✅ Reproducibility: Modular code with documented methodology
Requirements¶
Python Environment¶
- Python Version: 3.11.5+
- Key Libraries:
- Machine Learning: scikit-learn, xgboost, shap
- Geospatial: rasterio, rasterio.warp, geopandas, GDAL
- Data Processing: pandas, numpy, scipy
- Visualization: matplotlib, seaborn, cartopy
- Satellite Data: Google Earth Engine (geemap)
- Model Persistence: joblib
Installation¶
pip install numpy pandas matplotlib seaborn scikit-learn xgboost shap rasterio geopandas gdal geemap joblib cartopy
Project Structure¶
├── Hypoxia Prediction.ipynb # Main analysis notebook
├── README.md # This file
├── data/
│ ├── GCOOS_DO_*.csv # In-situ dissolved oxygen measurements
│ ├── train_*.csv # Multi-year satellite feature CSVs
│ ├── Hurricane.csv # Hurricane event metadata (optional)
│ ├── modis out/ # Input MODIS GeoTIFFs (14-band)
│ └── Predicted rast/ # Output prediction rasters
│ ├── *_hypoxia_pred.tif # Binary predictions
│ ├── *_hypoxia_uncertainty.tif # Confidence maps
│ └── area.csv # Time-series area quantification
├── models/
│ ├── best_model.pkl # Best-performing classifier
│ ├── RandomForest_tuned.pkl # Hyperparameter-tuned Random Forest
│ ├── LogisticRegression.pkl # Individual model artifacts
│ └── scaler.pkl # Fitted MinMaxScaler
└── figures/
├── study.jpg # study area
├── explore.jpg # exploratory plots
├── seasonal_analysis_v2.jpg # Temporal trend plots
└── average_raster_data.jpg # Annual average hypoxia maps
Workflow Summary¶
Part 1: Data Loading & Exploration¶
- Load GCOOS DO data
- Data Proccessing
- Some exploratory works
- Visualize hypoxia event trends (1970-2020)
Part 2: Data Integration¶
- Merge satellite features with DO labels
- Prepare balanced training set
Part 3: Model Building¶
- Train 8 classifiers
- Hyperparameter tune Random Forest
- Select best model by F1 score
Part 4: Interpretation & Diagnostics¶
- Confusion matrix, ROC/PR curves
- Bootstrap confidence intervals
- Feature importance (SHAP, permutation)
- Error analysis
Part 5: Spatial Prediction¶
- Apply model to MODIS rasters
- Generate binary + uncertainty maps
Part 6: Spatial Quantification¶
- Reproject rasters to projected CRS
- Calculate hypoxia area per time step
- Export CSV for temporal analysis
- Probability raster
Part 7: Spatial-Temporal Trend Analysis¶
- Monthly/seasonal/annual visualizations
- Trend lines and statistical summaries
- Publications Ready figures
Hypoxia Prediction: Whole Pipeline Overview¶

This figure summarizes the end-to-end workflow: data ingestion → preprocessing → feature engineering → split → model training → evaluation/inference → outputs.
Part 8: Next steps¶
Outputs¶
Model Artifacts¶
best_model.pkl: Best classifier (typically Random Forest)scaler.pkl: Fitted MinMaxScaler for preprocessing- Individual model pickles for all 8 classifiers
Prediction Rasters¶
- Binary Predictions:
*_hypoxia_pred.tif(0=non-hypoxic, 1=hypoxic) - Uncertainty Maps:
*_hypoxia_uncertainty.tif(0-255 confidence scale)
Quantitative Results¶
area.csv: Time-indexed hypoxia zone extents (km²)results_df: Model comparison table with all metrics
Visualizations¶
- Model comparison barplots
- ROC/PR curves
- Feature importance plots
- SHAP waterfall/beeswarm plots
- Partial dependence plots
- Temporal trend figures (monthly/seasonal/annual)
- Annual average hypoxia intensity maps
Usage¶
1. Run the Notebook¶
Open Hypoxia Prediction.ipynb in Jupyter/VSCode and execute cells sequentially.
Key Results¶
Model Performance (Test Set)¶
- Best Model: Random Forest (tuned)
- F1 Score: 0.92
- ROC-AUC: 0.96
- Balanced Accuracy: 0.94
- Precision (non-hypoxic): 0.95
- Recall (hypoxic): 0.95
Feature Importance (Top 3)¶
- Chlorophyll-a (chlor_a): Primary indicator of phytoplankton blooms
- Sea Surface Temperature (sst): Thermal stratification proxy
- Rrs_531: Green reflectance correlates with algal biomass
Temporal Patterns¶
- Seasonal Peak: Summer (June-August) shows maximum hypoxia extent
- Interannual Variability: 20-50% variation in annual mean extent
- Long-term Trend: Slight increasing trend (requires longer time series for significance)
Validation & Limitations¶
Strengths¶
- Large training dataset (multi-year satellite-DO pairs)
- Robust cross-validation and bootstrap uncertainty
- Transparent model interpretation (SHAP, permutation importance)
- Georeferenced outputs compatible with GIS workflows
Limitations¶
- Temporal Lag: Satellite composites are monthly; miss sub-monthly dynamics
- Cloud Contamination: Missing data in cloudy regions
- Depth Integration: Satellite only observes surface; hypoxia is bottom-layer phenomenon
- Validation Data: Independent in-situ buoy validation not yet implemented
Future Improvements¶
- Incorporate wind stress, river discharge, and bathymetry features
- Ensemble multiple models for improved robustness
- Validate against independent GCOOS buoy measurements (2021-2025)
- Implement deep learning (LSTM/CNN) for spatiotemporal prediction
Citation¶
If you use this pipeline, please cite:
1.Ahmad, H., Jose, F., Dash, P., Shoemaker, D. J., & Jhara, S. I. (2025). Hypoxia in the Gulf of Mexico: A machine learning approach for evaluation and prediction. Regional Studies in Marine Science, 104363.
Contact¶
For questions or collaborations:
- Author: Hafez Ahmad
- Institution: Mississippi State University
- Email: ha626@msstate.edu
- GitHub: hafez-ahmad
License¶
This project is licensed under the MIT License. See LICENSE file for details.
Last Updated: January 9, 2026
Notebook Version: 1.0
Python Version: 3.11.5
# Loading Libraries
import os # for file path manipulations
import glob # for file pattern matching
# Third-party Libraries
import matplotlib.pyplot as plt # for plotting
import seaborn as sns # for statistical data visualization
from matplotlib.colors import LogNorm # for logarithmic color normalization
import pandas as pd # for data manipulation
import numpy as np # for numerical operations
from scipy import stats # for statistical functions
import xarray as xr # for working with labeled multi-dimensional arrays
import rasterio # for raster data processing
import ee # for Google Earth Engine
import geemap # for geospatial data analysis and visualization
# Set visualization defaults
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
Part 1: Data Loading and Exploration¶
This section loads in-situ dissolved oxygen (DO) measurements from the Gulf of Mexico Coastal Ocean Observing System (GCOOS) to establish historical hypoxia patterns. We explore the temporal distribution of hypoxic events (DO < 2 mg/L), visualize trends over decades, and prepare the dataset for integration with satellite-derived features in subsequent parts. Key outputs include annual event counts, decadal statistics, and temporal trend visualizations.
df = pd.read_csv(r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\NC\Dissolved_Oxygen_-_Gulf_of_Mexico_(GCOOS).csv", usecols=['resultmeasurevalue_mg_l','latitude','longitude','activitystartdate'],parse_dates=['activitystartdate'])
df.head(3)
| latitude | longitude | activitystartdate | resultmeasurevalue_mg_l | |
|---|---|---|---|---|
| 0 | 30.2796 | -87.5417 | 1999-10-04 00:00:00+00:00 | 6.3 |
| 1 | 30.2796 | -87.5417 | 1999-11-01 00:00:00+00:00 | 6.8 |
| 2 | 30.2796 | -87.5417 | 1999-12-06 00:00:00+00:00 | 7.4 |
# select half of rows if 'resultmeasurevalue_mg_l' <2 and half of rows if 'resultmeasurevalue_mg_l' >2
# remove rows if 'resultmeasurevalue_mg_l' > 10
df2=df[df['resultmeasurevalue_mg_l']<10]
# greater than 0 : negative DO is not needed
df2=df2[df2['resultmeasurevalue_mg_l']>0]
# remove rows if -99, -9999, -999, -9.999, -999.999
df2=df2[~df2['resultmeasurevalue_mg_l'].isin([-99,-9999,-999,-9.999,-999.999])]
# Exploratory Data Analysis: Distribution and Statistics
# Basic statistics
print("=" * 60)
print("BASIC STATISTICS")
print("=" * 60)
print(df.describe())
print("\n" + "=" * 60)
print("DATA TYPES & MISSING VALUES")
print("=" * 60)
print(df.info())
print("\n" + "=" * 60)
print("MISSING VALUES COUNT")
print("=" * 60)
print(df.isnull().sum())
============================================================
BASIC STATISTICS
============================================================
latitude longitude resultmeasurevalue_mg_l year
count 6.725953e+06 6.725953e+06 6.725953e+06 6.725953e+06
mean 2.885543e+01 -8.780485e+01 -1.479944e+03 2.007931e+03
std 1.721641e+00 4.956432e+00 1.219334e+06 5.130250e+00
min 1.879200e+01 -9.772330e+01 -1.000000e+09 1.922000e+03
25% 2.794560e+01 -8.846290e+01 4.900000e+00 2.007000e+03
50% 2.970210e+01 -8.782280e+01 6.500000e+00 2.009000e+03
75% 3.038080e+01 -8.487520e+01 8.100000e+00 2.011000e+03
max 3.084060e+01 -8.000000e+01 9.999900e+04 2.014000e+03
============================================================
DATA TYPES & MISSING VALUES
============================================================
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6725953 entries, 0 to 6725952
Data columns (total 5 columns):
# Column Dtype
--- ------ -----
0 latitude float64
1 longitude float64
2 activitystartdate datetime64[ns, UTC]
3 resultmeasurevalue_mg_l float64
4 year int32
dtypes: datetime64[ns, UTC](1), float64(3), int32(1)
memory usage: 230.9 MB
None
============================================================
MISSING VALUES COUNT
============================================================
latitude 0
longitude 0
activitystartdate 0
resultmeasurevalue_mg_l 0
year 0
dtype: int64
Exploratory Data Analysis¶
Comprehensive statistical and visual exploration of the dissolved oxygen dataset to understand data quality, distributions, and patterns before model development.
Statistical Summary:
- Basic statistics (describe, info)
- Missing values count
- Data types overview
Visualizations (2×2 subplot grid):
- Histogram: Distribution of dissolved oxygen values with hypoxia threshold (2 mg/L) marked
- Box Plot: Shows outliers and quartiles with hypoxia line
- Spatial Scatter: Geographic distribution colored by DO concentration (red=low, green=high)
- Temporal Line Plot: Number of measurements per year showing data coverage
Hypoxia Statistics:
- Total measurement count
- Hypoxic event count (DO < 2 mg/L)
- Hypoxic percentage
- Date range coverage
- Spatial extent (latitude/longitude bounds)
This provides a complete overview of the dissolved oxygen dataset before proceeding with hypoxia classification and model training. All outputs are nicely formatted with headers and visual markers for the hypoxia threshold.
# Distribution of dissolved oxygen values
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Histogram of DO values
axes[0, 0].hist(df2['resultmeasurevalue_mg_l'].dropna(), bins=50, color='steelblue', edgecolor='black')
axes[0, 0].axvline(2, color='red', linestyle='--', linewidth=2, label='Hypoxia Threshold (2 mg/L)')
axes[0, 0].set_xlabel('Dissolved Oxygen (mg/L)', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Frequency', fontsize=12, fontweight='bold')
axes[0, 0].set_title('Distribution of Dissolved Oxygen Values', fontsize=14, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)
# Box plot
axes[0, 1].boxplot(df2['resultmeasurevalue_mg_l'].dropna(), vert=True)
axes[0, 1].axhline(2, color='red', linestyle='--', linewidth=2, label='Hypoxia Threshold')
axes[0, 1].set_ylabel('Dissolved Oxygen (mg/L)', fontsize=12, fontweight='bold')
axes[0, 1].set_title('DO Value Box Plot (Outliers Visible)', fontsize=14, fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)
# Spatial distribution scatter
scatter = axes[1, 0].scatter(df['longitude'], df['latitude'],
c=df['resultmeasurevalue_mg_l'],
cmap='RdYlGn', s=1, alpha=0.6, vmin=0, vmax=10)
axes[1, 0].set_xlabel('Longitude', fontsize=12, fontweight='bold')
axes[1, 0].set_ylabel('Latitude', fontsize=12, fontweight='bold')
axes[1, 0].set_title('Spatial Distribution of DO Measurements', fontsize=14, fontweight='bold')
plt.colorbar(scatter, ax=axes[1, 0], label='DO (mg/L)')
axes[1, 0].grid(alpha=0.3)
# Temporal distribution
df['year'] = df['activitystartdate'].dt.year
year_counts = df['year'].value_counts().sort_index()
axes[1, 1].plot(year_counts.index, year_counts.values, marker='o', linewidth=2, markersize=5, color='darkgreen')
axes[1, 1].set_xlabel('Year', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Number of Measurements', fontsize=12, fontweight='bold')
axes[1, 1].set_title('Temporal Coverage of DO Measurements', fontsize=14, fontweight='bold')
axes[1, 1].grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Hypoxia statistics
hypoxic_count = (df['resultmeasurevalue_mg_l'] < 2).sum()
total_count = df['resultmeasurevalue_mg_l'].notna().sum()
hypoxic_pct = (hypoxic_count / total_count * 100) if total_count > 0 else 0
print("\n" + "=" * 60)
print("HYPOXIA STATISTICS")
print("=" * 60)
print(f"Total measurements: {total_count:,}")
print(f"Hypoxic measurements (DO < 2 mg/L): {hypoxic_count:,}")
print(f"Hypoxic percentage: {hypoxic_pct:.2f}%")
print(f"Date range: {df['activitystartdate'].min()} to {df['activitystartdate'].max()}")
print(f"Spatial extent: Lat [{df['latitude'].min():.2f}, {df['latitude'].max():.2f}], Lon [{df['longitude'].min():.2f}, {df['longitude'].max():.2f}]")
print("=" * 60)
============================================================ HYPOXIA STATISTICS ============================================================ Total measurements: 6,725,953 Hypoxic measurements (DO < 2 mg/L): 296,657 Hypoxic percentage: 4.41% Date range: 1922-02-01 00:00:00+00:00 to 2014-08-14 00:00:00+00:00 Spatial extent: Lat [18.79, 30.84], Lon [-97.72, -80.00] ============================================================
Do for filtering, adding hypoxia column, field data visualization¶
# Add binary column for hypoxia classification
# Creates a binary label: 1 = non-hypoxic (DO > 2 mg/L), 0 = hypoxic (DO ≤ 2 mg/L)
# Note: This encoding is later inverted during model training where we use (DO < 2) as class 1 (hypoxic)
df2['hypoxia'] = (df2['resultmeasurevalue_mg_l'] > 2).astype(int)
# Convert 'activitystartdate' to datetime
df2['activitystartdate'] = pd.to_datetime(df2['activitystartdate'])
# Extract year from 'activitystartdate'
df2['year'] = df2['activitystartdate'].dt.year
df2.head(3)
| latitude | longitude | activitystartdate | resultmeasurevalue_mg_l | hypoxia | year | |
|---|---|---|---|---|---|---|
| 0 | 30.2796 | -87.5417 | 1999-10-04 00:00:00+00:00 | 6.3 | 1 | 1999 |
| 1 | 30.2796 | -87.5417 | 1999-11-01 00:00:00+00:00 | 6.8 | 1 | 1999 |
| 2 | 30.2796 | -87.5417 | 1999-12-06 00:00:00+00:00 | 7.4 | 1 | 1999 |
df2=df2[df2['year']>1970]
# Count hypoxia events per year #To count per month: df2.groupby(df2['activitystartdate'].dt.to_period("M"))['hypoxia'].sum().reset_index()
hypoxia_counts = df2.groupby('year')['hypoxia'].sum().reset_index()
# Plotting
plt.figure(figsize=(12, 6))
sns.lineplot(x='year', y='hypoxia', data=hypoxia_counts, marker='o', label='Hypoxia Events')
# Adding regression line
sns.regplot(x='year', y='hypoxia', data=hypoxia_counts, scatter=False, label='Regression Line')
# Set y-axis to log scale
plt.title('Hypoxia Events Per Year with Regression Line', fontsize=14,fontweight='bold')
plt.xlabel('Year', fontsize=12,fontweight='bold')
plt.ylabel('Number of Hypoxia Events Per Years (Combined)', fontsize=12,fontweight='bold')
# xlim minimum and maximum of df2['year']
plt.xlim(df2['year'].min()-1, df2['year'].max()+2)
# ylimit 0 to maximum of hypoxia_counts['hypoxia'] +10
plt.ylim(0, hypoxia_counts['hypoxia'].max()+10)
# add xticks
plt.xticks(np.arange(df2['year'].min()-1, df2['year'].max()+2, 3.0))
plt.legend()
# save figure
#plt.savefig(r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\figure\hypoxiaevtn.jpg",dpi=400,bbox_inches='tight')
plt.show()
Checking decadal changce¶
# Create a new column for the 5/10-year periods
df2['5_year_period'] = (df2['activitystartdate'].dt.year // 10) * 10
# Group by the 5-year periods and calculate descriptive statistics
hypoxia_stats = df2.groupby('5_year_period')['hypoxia'].describe()
print(hypoxia_stats)
count mean std min 25% 50% 75% max 5_year_period 1970 8102.0 0.997902 0.045761 0.0 1.0 1.0 1.0 1.0 1980 26517.0 0.997322 0.051676 0.0 1.0 1.0 1.0 1.0 1990 281893.0 0.977438 0.148502 0.0 1.0 1.0 1.0 1.0 2000 3119826.0 0.954849 0.207636 0.0 1.0 1.0 1.0 1.0 2010 2656997.0 0.949562 0.218847 0.0 1.0 1.0 1.0 1.0
Part 1. 2 : Extracting Satellite Data from Cloud and utilizing Cloud Computing¶
This section demonstrates how to download satellite imagery from Google Earth Engine (GEE), a powerful platform for geospatial analysis.
What We're Doing:¶
- Define Study Area: Load a predefined geographic boundary (US Gulf of Mexico EEZ for FL, MS, LA, TX)
- Create Date Range: Generate monthly date intervals from 2018-2022
- Query Satellite Data: Access MODIS-Aqua ocean color measurements covering:
- Chlorophyll-a concentration (primary productivity indicator)
- Normalized fluorescence line height (phytoplankton stress)
- Particulate organic carbon
- Sea surface temperature
- Multiple spectral reflectance bands (Rrs)
What do these variables mean?¶
- Chlorophyll‑a (
chlor_a, mg/m³): Proxy for phytoplankton biomass derived from ocean‑color algorithms (e.g., OCx). Elevated chlor_a indicates algal blooms that, upon decay, can increase oxygen demand. - Normalized Fluorescence Line Height (
nflh, W·m⁻²·µm⁻¹·sr⁻¹): Measures chlorophyll fluorescence near 678 nm above a spectral baseline; sensitive to phytoplankton physiology and bloom status. - Particulate Organic Carbon (
poc, mg/m³): Concentration of carbon contained in suspended particles. Higher POC suggests greater organic matter available for microbial respiration, raising biochemical oxygen demand. - Sea Surface Temperature (
sst, °C): Near‑surface water temperature. Warmer SST strengthens stratification (reduced vertical mixing) and accelerates respiration, both of which can lower dissolved oxygen. - Remote Sensing Reflectance (
Rrs_λ, sr⁻¹): Water‑leaving reflectance at wavelength λ (nm). Rrs bands used here: 412, 443, 469, 488, 531, 547, 555, 645, 667, 678. These spectral measurements underpin the retrieval of chlor_a, nflh, and other bio‑optical products.
These variables capture biomass (chlor_a), organic load (poc), thermal/stratification state (sst), and the spectral signatures (Rrs) from which they’re derived—together informing hypoxia risk.
- Process & Export: Aggregate monthly data and export to Google Drive at 1km resolution
Why This Matters for Hypoxia Prediction:¶
- Satellite data provides continuous spatial coverage of ocean properties
- Monthly aggregation captures seasonal patterns linked to hypoxia development
- Multiple spectral bands enable detection of algal blooms and oceanographic conditions preceding dead zones
- Computational efficiency: Processing massive remote sensing datasets requires cloud-based tools like GEE
Why MODIS-Aqua L3SMI (NASA/OCEANDATA/MODIS-Aqua/L3SMI)?¶
MODIS-Aqua (Moderate Resolution Imaging Spectroradiometer aboard NASA's Aqua satellite) is the optimal choice for hypoxia prediction due to:
- Ocean-Optimized Mission: Aqua's afternoon overpass (1:30 PM local time) minimizes sun glint and provides optimal ocean color retrieval
- Long-Term Continuity: Operational since 2002, providing 20+ years of consistent observations for trend analysis
- Comprehensive Bio-Optical Suite: L3SMI (Level-3 Standard Mapped Image) products include:
- Chlorophyll-a: Direct proxy for phytoplankton biomass/algal blooms
- NFLH: Detects phytoplankton fluorescence and stress (bloom senescence indicators)
- POC: Measures organic matter export contributing to oxygen depletion
- SST: Stratification driver critical for hypoxia formation
- Multi-band Rrs: Water-leaving radiance for advanced bio-optical algorithms
Why Chlorophyll‑a, POC, and SST are sensitive to hypoxia¶
- Chlorophyll‑a (biomass proxy): Elevated chlor_a signals phytoplankton blooms. When blooms decay, microbial decomposition consumes dissolved oxygen, often triggering or intensifying hypoxic conditions—especially in stratified waters.
- Particulate Organic Carbon (POC): POC represents exportable organic matter. As it sinks, benthic and pelagic respiration oxidize this carbon, increasing biochemical oxygen demand (BOD) and depleting DO in bottom layers.
- Sea Surface Temperature (SST): Warmer SST strengthens stratification (reduced vertical mixing), limiting oxygen resupply to deeper waters. Higher temperatures also accelerate microbial respiration, compounding oxygen loss.
- Combined effect: High chlor_a + elevated POC under warm, stratified conditions markedly increases hypoxia risk. These variables are physically linked to the mechanisms that reduce DO, which is why they appear highly informative in feature importance and PDP/ICE analyses.
- Pre-Processed Quality: L3 products are atmospherically corrected, binned, and mapped to geographic grids, eliminating need for complex radiometric calibration
- Spatial Coverage & Resolution: Daily global coverage at ~1-4 km enables both synoptic views and regional detail for the Gulf of Mexico
- Scientific Heritage: MODIS is the gold standard for ocean color remote sensing, extensively validated against in-situ measurements and widely used in hypoxia research
Alternative Satellite Products (e.g., VIIRS, Sentinel-3 OLCI) lack MODIS-Aqua's temporal depth or have different spectral configurations, making MODIS the most appropriate choice for this analysis.
study_area = ee.FeatureCollection("projects/ee-hafezahmad10/assets/US_GULF_eez_FL_MS_L_T")
import calendar
years = range(2018, 2022) # Define the range of years
date_start = []
date_end = []
for year in years:
for month in range(1, 13): # Loop over each month (1 to 12)
# Get the number of days in the month
num_days = calendar.monthrange(year, month)[1]
# Append the start and end dates for the current month
date_start.append(f'{year}-{month:02d}-01')
date_end.append(f'{year}-{month:02d}-{num_days}')
# now use for loop to iterate over date_start and date_end and image_collection
for start, end in zip(date_start, date_end):
# filter image collection
image_collection = ee.ImageCollection('NASA/OCEANDATA/MODIS-Aqua/L3SMI').filterBounds(study_area).filterDate(start, end).mean().clip(study_area)
# generate text B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, B12 and B8A
bands=['chlor_a', 'nflh','poc','sst','Rrs_412','Rrs_443','Rrs_469','Rrs_488','Rrs_531','Rrs_547','Rrs_555','Rrs_645','Rrs_667','Rrs_678']
out_tif =f"OCEANDATA_MODIS-Aqua_L3SMI{start}"
# to drive
geemap.ee_export_image_to_drive(image_collection.select(bands),description=out_tif, scale=1000, region=study_area.geometry(),folder='GEE_exports')
# add new column:id:sequence of numbers starting from 1
df2['id']=np.arange(1,len(df2)+1)
df2
| latitude | longitude | activitystartdate | resultmeasurevalue_mg_l | hypoxia | year | 5_year_period | id | |
|---|---|---|---|---|---|---|---|---|
| 0 | 30.2796 | -87.5417 | 1999-10-04 00:00:00+00:00 | 6.30 | 1 | 1999 | 1990 | 1 |
| 1 | 30.2796 | -87.5417 | 1999-11-01 00:00:00+00:00 | 6.80 | 1 | 1999 | 1990 | 2 |
| 2 | 30.2796 | -87.5417 | 1999-12-06 00:00:00+00:00 | 7.40 | 1 | 1999 | 1990 | 3 |
| 3 | 30.2796 | -87.5417 | 2000-01-10 00:00:00+00:00 | 7.52 | 1 | 2000 | 2000 | 4 |
| 4 | 30.2796 | -87.5417 | 2000-02-07 00:00:00+00:00 | 8.30 | 1 | 2000 | 2000 | 5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6725948 | 30.3808 | -87.8320 | 2014-01-01 00:00:00+00:00 | 3.80 | 1 | 2014 | 2010 | 6093331 |
| 6725949 | 30.3808 | -87.8320 | 2014-01-01 00:00:00+00:00 | 3.80 | 1 | 2014 | 2010 | 6093332 |
| 6725950 | 30.3808 | -87.8320 | 2014-01-01 00:00:00+00:00 | 3.80 | 1 | 2014 | 2010 | 6093333 |
| 6725951 | 30.3808 | -87.8320 | 2014-01-01 00:00:00+00:00 | 3.70 | 1 | 2014 | 2010 | 6093334 |
| 6725952 | 30.3808 | -87.8320 | 2014-01-01 00:00:00+00:00 | 3.70 | 1 | 2014 | 2010 | 6093335 |
6093335 rows × 8 columns
Generating Training points¶
# Convert DataFrame to Earth Engine FeatureCollection
point2022=geemap.df_to_ee(df2)
study_area = ee.FeatureCollection("projects/ee-hafezahmad10/assets/US_GULF_eez_FL_MS_L_T")
image_collection = ee.ImageCollection('NASA/OCEANDATA/MODIS-Aqua/L3SMI').filterBounds(study_area).filterDate('2010-01-01', '2010-01-31').mean().clip(study_area)
# generate text B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, B12 and B8A
bands=['chlor_a', 'nflh','poc','sst','Rrs_412','Rrs_443','Rrs_469','Rrs_488','Rrs_531','Rrs_547','Rrs_555','Rrs_645','Rrs_667','Rrs_678']
path=r"D:\my works\bloom\data\MODIS"
out_csv = os.path.join(path, 'train.csv')
# downloading data to csv ,scale 500 meters
geemap.extract_values_to_points(point2022, image_collection.select(bands), out_csv,scale=500)
Part 2: Data Integration¶
This section merges satellite spectral features with in-situ hypoxia labels. MODIS-Aqua reflectance data (14 bands) is spatially matched to GCOOS buoy locations, creating a feature matrix with corresponding binary hypoxia classifications. The result is a unified training dataset linking remotely-sensed optical properties to dissolved oxygen conditions.
path=r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\csv"
# find all "train" containing .csv files
files = glob.glob(os.path.join(path, "*train*.csv"))
# read first only
df=pd.read_csv(files[0])
df.drop(columns=['OID_', 'latitude', 'longitude', 'activitystartdate'], inplace=True)
# read next all and rename columns based on first file columns
for file in files[1:]:
df1 = pd.read_csv(file)
df1.drop(columns=['OID_', 'latitude', 'longitude', 'activitystartdate'], inplace=True)
df1.columns = df.columns
df = pd.concat([df, df1], ignore_index=True)
# remove nan values
df.dropna(inplace=True)
print(df.shape)
print(df.columns)
df.head(2)
(116444, 15)
Index(['resultmeasurevalue_mg_l', 'chlor_a', 'nflh', 'poc', 'sst', 'Rrs_412',
'Rrs_443', 'Rrs_469', 'Rrs_488', 'Rrs_531', 'Rrs_547', 'Rrs_555',
'Rrs_645', 'Rrs_667', 'Rrs_678'],
dtype='object')
| resultmeasurevalue_mg_l | chlor_a | nflh | poc | sst | Rrs_412 | Rrs_443 | Rrs_469 | Rrs_488 | Rrs_531 | Rrs_547 | Rrs_555 | Rrs_645 | Rrs_667 | Rrs_678 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4.997 | 0.209567 | 0.019738 | 63.439999 | 25.814131 | 0.005095 | 0.005134 | 0.004986 | 0.004535 | 0.002386 | 0.001788 | 0.001522 | 0.000124 | 0.000172 | 0.000168 |
| 1 | 5.003 | 0.209567 | 0.019738 | 63.439999 | 25.814131 | 0.005095 | 0.005134 | 0.004986 | 0.004535 | 0.002386 | 0.001788 | 0.001522 | 0.000124 | 0.000172 | 0.000168 |
Part 3: Model Building & Training¶
This section trains and compares eight distinct classifiers using satellite features to predict hypoxia risk. Each model is evaluated on multiple metrics (accuracy, balanced accuracy, precision, recall, F1, ROC-AUC) to identify the best performer. Hyperparameters are tuned via RandomizedSearchCV, and all models are persisted to disk for reproducibility and deployment.
Overview: Classical Machine Learning Approach (Teaching Purpose)¶
Important Note on Scope:
This section trains 8 classical machine learning classifiers using established, well-documented algorithms from scikit-learn. This approach serves pedagogical and baseline purposes:
- Proof-of-Concept: Demonstrates that satellite ocean color data can predict hypoxia events using standard ML techniques
- Interpretability: Classical models (Random Forest, Logistic Regression, SVM, etc.) provide transparent feature importance and decision rules, critical for coastal management applications
- Reproducibility: Uses published, peer-reviewed algorithms with documented hyperparameters, making results verifiable and extensible by the research community
- Teaching Value: Provides a clear framework for learning ML workflows: data preparation → model training → evaluation → interpretation
Related Published Work & Extensions¶
This work builds on foundational hypoxia prediction research already published by the authors. The classical models trained here establish:
- Performance benchmarks for ocean-based hypoxia detection
- Feature importance rankings guiding future satellite sensor selection
- Operational decision thresholds for coastal management systems
Future Extensions Beyond Scope:
While this notebook focuses on classical ML, the pipeline is designed for extensibility. Promising advanced models for hypoxia classification include:
| Model | Why Useful for Hypoxia | Advantage | Challenge |
|---|---|---|---|
| Deep Neural Networks (DNNs) | High-dimensional satellite & environmental data | Learn hierarchical non-linear features; handle temporal sequences | Requires large datasets (~10k+ samples); harder to interpret decisions |
| Graph Neural Networks (GNNs) | Spatial adjacency of hypoxic zones matters | Explicit representation of spatial correlations between ocean pixels | Complex implementation; requires spatial graph construction |
| Recurrent Neural Networks (RNNs/LSTMs) | Hypoxia has temporal memory (stratification persists) | Capture temporal dependencies (e.g., "SST trend + density stratification → risk") | Needs long time series per location; computationally expensive |
| Ensemble Boosting (XGBoost, LightGBM) | Handles mixed feature types & nonlinearity | Often outperforms classical RF in practice; built-in feature selection | Black-box nature; harder to interpret interactions |
| Uncertainty Quantification (Bayesian NNs, Variational Inference) | Coastal managers need prediction confidence bounds | Quantify uncertainty in hypoxia predictions; distinguish model vs. data uncertainty | High computational cost; requires posterior sampling |
| Transfer Learning (Pre-trained Vision Models) | Satellite imagery as spatial-spectral "images" | Leverage learned representations from MODIS/Sentinel time series; fine-tune on hypoxia | Domain-specific training data needed; generic vision models may not transfer well |
Current Implementation: Classical Baseline¶
We train 8 standard classifiers from scikit-learn to establish a solid baseline. These models are transparent, fast, and effective for imbalanced hypoxia detection. This section demonstrates:
- How to structure a model comparison workflow
- How to evaluate multi-class and binary classification metrics
- How to persist models and scalers for production inference
Future work will explore adding one or more advanced models from the table above, depending on data availability and computational resources.¶
# Model training and evaluation
import joblib # for saving models
from sklearn.model_selection import train_test_split # for splitting data
from sklearn.ensemble import RandomForestClassifier # for ensemble learning
from sklearn.linear_model import LogisticRegression # for logistic regression
from sklearn.svm import SVC # for support vector machine
from sklearn.neighbors import KNeighborsClassifier #
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import (
accuracy_score,
balanced_accuracy_score,
classification_report,
precision_recall_fscore_support,
roc_auc_score,
)
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import make_pipeline
from sklearn.utils import resample
from xgboost import XGBClassifier
# Reusable scaler (used only inside pipelines)
scaler = MinMaxScaler() # scaling features to a given range
# Feature and target definitions
feature_columns = [
'chlor_a', 'nflh', 'poc', 'sst', 'Rrs_412', 'Rrs_443', 'Rrs_469',
'Rrs_488', 'Rrs_531', 'Rrs_547', 'Rrs_555', 'Rrs_645', 'Rrs_667', 'Rrs_678'
]
target_column = 'resultmeasurevalue_mg_l'
# Binary label: 1 = hypoxic (<2 mg/L), 0 = non-hypoxic
def convert_to_class(value: float) -> int:
return 1 if value < 2 else 0
df['class_label'] = df[target_column].apply(convert_to_class)
df.drop(columns=[target_column], inplace=True)
print(df['class_label'].value_counts())
class_label 0 115248 1 1196 Name: count, dtype: int64
Why balance the classes?¶
- Hypoxia events are typically rarer than non-hypoxic observations, so an unbalanced dataset can mislead models to predict mostly the majority class and still appear “accurate.”
- Balancing helps the model learn minority patterns and improves recall/precision for the hypoxic class (the signal we care about).
Balancing approach used here¶
- Downsampling majority class: Randomly reduce the non-hypoxic samples to match the hypoxic count. This keeps the dataset size small and avoids duplicating minority samples.
- We also use class weights where supported (e.g., Random Forest, Logistic Regression, SVM, Decision Tree) to further mitigate imbalance during training.
- Alternative options (not shown): SMOTE/ADASYN (synthetic minority oversampling) or class-weight–only training without resampling.
# Address class imbalance by downsampling the majority class
majority = df[df.class_label == 0]# majority class: non-hypoxic
minority = df[df.class_label == 1] # minority class: hypoxic
majority_downsampled = resample(
majority,
replace=False,
n_samples=minority.shape[0],
random_state=123,
)
df_balanced = pd.concat([minority, majority_downsampled])
print(df_balanced.class_label.value_counts())
class_label 1 1196 0 1196 Name: count, dtype: int64
Train/test split rationale¶
- We hold out 30% of the balanced data for testing to estimate generalization on unseen samples.
- Stratified split preserves the hypoxic/non-hypoxic ratio in both train and test sets, preventing skewed evaluation.
- Random seed fixed (42) for reproducibility across runs.
# Split features/labels
X = df_balanced[feature_columns]
y = df_balanced['class_label']
# Train/test split with stratification
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
) # test size 30% and train size 70%
# Create and save a standalone MinMaxScaler fitted on training data
# This scaler will be used for feature normalization during prediction on new raster data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
models_directory = r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\models1'
# Persist the fitted scaler
scaler_path = os.path.join(models_directory, 'MinMaxScaler.pkl')
joblib.dump(scaler, scaler_path)
print(f'✓ Created and saved MinMaxScaler to: {scaler_path}')
print(f' Scaler fitted on training data with shape: {X_train.shape}')
print(f' Feature min/max ranges: {list(zip(scaler.data_min_[:3], scaler.data_max_[:3]))}...')
Model registry & evaluation metrics¶
- Model registry: A dictionary that maps model names to estimators plus whether they need scaling. This makes it easy to iterate, fit, save, and compare many classifiers in one loop.
- Classification task: Predict binary hypoxia risk (1 = hypoxic, 0 = non-hypoxic) using remote-sensing features (chlorophyll, SST, Rrs bands, etc.).
- Metrics tracked:
- Accuracy and balanced accuracy (handles class imbalance)
- Precision, recall, F1 (focus on hypoxic detection quality)
- ROC-AUC (probability ranking quality)
- Persistence: Each fitted model is saved to disk (
.pkl) with its preprocessing (scaler) bundled when applicable.
# Model registry with simple, sane defaults
models = {
'Random Forest': {
'est': RandomForestClassifier(
n_estimators=300, # n_estimators number of trees
max_depth=None,
n_jobs=-1, # use all processors
class_weight='balanced',
random_state=42,# really important for reproducibility
),
'scale': False,
},
'Logistic Regression': {
'est': LogisticRegression(
max_iter=1000, # increase max iterations
class_weight='balanced',
solver='lbfgs',# good for small datasets,lbfgs stands for Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm
random_state=42,
),
'scale': True,
},
'Support Vector Machine': {
'est': SVC(
kernel='rbf',# radial basis function kernel, which is good for non-linear data
C=2.0,
gamma='scale',# scale is 1 / (n_features * X.var()) as value of gamma
probability=True,
class_weight='balanced',# important for imbalanced datasets
random_state=42,
),
'scale': True,
},
'K-Nearest Neighbors': {
'est': KNeighborsClassifier(n_neighbors=7, weights='distance'),# weights='distance' means closer neighbors have more influence
'scale': True,
},
'Decision Tree': {
'est': DecisionTreeClassifier(
max_depth=8,# limit depth to prevent overfitting
class_weight='balanced',
random_state=42,
),
'scale': False,
},
'Naive Bayes': {
'est': GaussianNB(),# Gaussian Naive Bayes classifier
'scale': True,
},
'XGBoost': {
'est': XGBClassifier(
n_estimators=300,
max_depth=4,
learning_rate=0.05,# smaller learning rate for better convergence
subsample=0.8, # subsample ratio of the training instances
colsample_bytree=0.8,
eval_metric='logloss',# evaluation metric
seed=42,
),
'scale': False,
},
'MLP': {
'est': MLPClassifier(
hidden_layer_sizes=(64, 32),# 64, 32 means we are using two hidden layers with 64 and 32 neurons respectively
max_iter=500, # maximum number of iterations
random_state=42,
),
'scale': True,
},
}
# models directory
models_directory = r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\models1'
os.makedirs(models_directory, exist_ok=True)
# Function to compute evaluation metrics: accuracy, balanced accuracy, precision, recall, F1 score, and ROC AUC
def get_scores(model, X_te, y_te):
y_pred = model.predict(X_te)
proba = None
if hasattr(model, 'predict_proba'):
proba = model.predict_proba(X_te)[:, 1]
elif hasattr(model, 'decision_function'):
proba = model.decision_function(X_te)
acc = accuracy_score(y_te, y_pred)
bal_acc = balanced_accuracy_score(y_te, y_pred)
precision, recall, f1, _ = precision_recall_fscore_support(
y_te, y_pred, average='binary', zero_division=0
)
roc = roc_auc_score(y_te, proba) if proba is not None else np.nan
return acc, bal_acc, precision, recall, f1, roc
results = []
# Train, evaluate, and persist each model
for name, cfg in models.items():
est = cfg['est']
model = make_pipeline(MinMaxScaler(), est) if cfg['scale'] else est
model.fit(X_train, y_train)
acc, bal_acc, precision, recall, f1, roc = get_scores(model, X_test, y_test)
# Persist model
model_path = os.path.join(models_directory, f'{name}.pkl')
joblib.dump(model, model_path)
results.append({
'Classifier': name,
'Accuracy': acc,
'Balanced_Accuracy': bal_acc,
'Precision': precision,
'Recall': recall,
'F1': f1,
'ROC_AUC': roc,
'Model_Path': model_path,
})
# Create results DataFrame and sort by F1 score
results_df = pd.DataFrame(results)
# show results sorted by F1 score
results_df.sort_values(by='F1', ascending=False)
| Classifier | Accuracy | Balanced_Accuracy | Precision | Recall | F1 | ROC_AUC | Model_Path | |
|---|---|---|---|---|---|---|---|---|
| 0 | Random Forest | 0.915042 | 0.915042 | 0.886010 | 0.952646 | 0.918121 | 0.961946 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 6 | XGBoost | 0.913649 | 0.913649 | 0.881748 | 0.955432 | 0.917112 | 0.956755 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 3 | K-Nearest Neighbors | 0.909471 | 0.909471 | 0.880829 | 0.947075 | 0.912752 | 0.943603 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 4 | Decision Tree | 0.903900 | 0.903900 | 0.873711 | 0.944290 | 0.907631 | 0.940542 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 2 | Support Vector Machine | 0.884401 | 0.884401 | 0.828571 | 0.969359 | 0.893453 | 0.916570 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 7 | MLP | 0.881616 | 0.881616 | 0.834146 | 0.952646 | 0.889467 | 0.941167 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 1 | Logistic Regression | 0.876045 | 0.876045 | 0.806818 | 0.988858 | 0.888611 | 0.905638 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 5 | Naive Bayes | 0.818942 | 0.818942 | 0.815427 | 0.824513 | 0.819945 | 0.874939 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
Why select best model by F1 score?¶
F1 is the harmonic mean of precision and recall: $F1 = 2 \times \frac{{\text{Precision} \times \text{Recall}}}{{\text{Precision} + \text{Recall}}}$
For hypoxia detection, this matters because:
- Imbalanced Data: Hypoxic observations are rare; accuracy alone is misleading (a model predicting "all non-hypoxic" achieves high accuracy but misses events).
- Precision-Recall Trade-off: We want to minimize both false positives (false alarms to coastal planners) and false negatives (missed hypoxic zones). F1 balances both equally.
- Minority Class Focus: F1 rewards models that correctly identify the hypoxic class, avoiding the majority class bias. High F1 ≈ effective hypoxia detection.
- Reproducibility: F1 is deterministic and threshold-independent, making model selection stable and comparable across runs.
Alternatives considered but rejected:
- Accuracy: Misleads on imbalanced data; a model predicting the majority class looks good.
- ROC-AUC: Excellent for ranking probability quality, but doesn't force a specific operating point; leaves threshold selection ambiguous.
- Precision alone: Would ignore missed hypoxic events (high false negatives).
- Recall alone: Would flood users with false alarms (high false positives).
Bottom line: F1 directly optimizes for a balanced, operationally useful model that detects hypoxia reliably without excessive false alarms.
Hyperparameter Tuning: Grid Search vs. Randomized Search¶
Why Tune Hyperparameters?
Machine learning models have hyperparameters (e.g., tree depth, learning rate) that control complexity and generalization. Default values rarely yield optimal performance. Tuning systematically explores the hyperparameter space to find configurations that maximize a target metric (e.g., F1 score).
Grid Search:
- Exhaustively evaluates all combinations in a predefined hyperparameter grid
- Pros: Guarantees finding the best combination within the grid; reproducible
- Cons: Computationally expensive for large grids (exponential growth); may waste time on irrelevant regions
Randomized Search:
- Samples a fixed number of random hyperparameter combinations from specified distributions
- Pros: Much faster; efficiently explores high-dimensional spaces; often finds near-optimal solutions with fewer iterations
- Cons: No guarantee of finding the absolute best; requires setting appropriate sampling distributions
When to Use Each:
- Grid Search: Small hyperparameter spaces (<100 combinations); when exhaustive search is feasible
- Randomized Search: Large/continuous hyperparameter spaces; when time/compute is limited; as a first-pass before fine-tuning with Grid Search
Implementation Below:
We use RandomizedSearchCV with 12 random iterations to efficiently tune Random Forest hyperparameters (n_estimators, max_depth, min_samples_split, etc.), optimizing for F1 score with 3-fold cross-validation to balance performance and computational cost.
# Optional: hyperparameter tuning for Random Forest
from sklearn.model_selection import RandomizedSearchCV
rf_base = RandomForestClassifier(
n_estimators=400,
class_weight='balanced',
random_state=42,
n_jobs=-1,
)
param_dist = {
'n_estimators': [200, 300, 400, 500],
'max_depth': [None, 6, 8, 10, 12],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['sqrt', 'log2'],
}
# Perform randomized search
search = RandomizedSearchCV(
rf_base,
param_distributions=param_dist,
n_iter=12,
cv=3,
scoring='f1', # prioritize hypoxia class detection
n_jobs=-1,
random_state=42,
verbose=0,
)
# Perform randomized search
search.fit(X_train, y_train)
best_rf = search.best_estimator_
acc, bal_acc, precision, recall, f1, roc = get_scores(best_rf, X_test, y_test)
# Persist tuned model
rf_tuned_path = os.path.join(models_directory, 'RandomForest_tuned.pkl')
joblib.dump(best_rf, rf_tuned_path)
# Append tuned results to the summary table
rf_row = {
'Classifier': 'Random Forest (tuned)',
'Accuracy': acc,
'Balanced_Accuracy': bal_acc,
'Precision': precision,
'Recall': recall,
'F1': f1,
'ROC_AUC': roc,
'Model_Path': rf_tuned_path,
}
results_df = pd.concat([results_df, pd.DataFrame([rf_row])], ignore_index=True)
# Choose overall best by F1 and save a canonical best_model.pkl
best_row_overall = results_df.sort_values(by='F1', ascending=False).iloc[0]
best_model_path_overall = best_row_overall['Model_Path']
best_model_overall = joblib.load(best_model_path_overall)
best_model_canonical_path = os.path.join(models_directory, 'best_model.pkl')
joblib.dump(best_model_overall, best_model_canonical_path)
print('Best RF params:', search.best_params_)
print('Tuned RF test metrics:', rf_row)
print('Overall best model:', best_row_overall['Classifier'])
print('Saved canonical best model to:', best_model_canonical_path)
results_df.sort_values(by='F1', ascending=False)
Best RF params: {'n_estimators': 200, 'min_samples_split': 10, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'max_depth': 10}
Tuned RF test metrics: {'Classifier': 'Random Forest (tuned)', 'Accuracy': 0.915041782729805, 'Balanced_Accuracy': 0.915041782729805, 'Precision': 0.8860103626943006, 'Recall': 0.9526462395543176, 'F1': 0.9181208053691275, 'ROC_AUC': 0.9583996089415817, 'Model_Path': 'C:\\Users\\hafez\\MSU\\Research\\msGOM\\mssound\\bloom\\data\\models1\\RandomForest_tuned.pkl'}
Overall best model: Random Forest
Saved canonical best model to: C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\models1\best_model.pkl
| Classifier | Accuracy | Balanced_Accuracy | Precision | Recall | F1 | ROC_AUC | Model_Path | |
|---|---|---|---|---|---|---|---|---|
| 0 | Random Forest | 0.915042 | 0.915042 | 0.886010 | 0.952646 | 0.918121 | 0.961946 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 8 | Random Forest (tuned) | 0.915042 | 0.915042 | 0.886010 | 0.952646 | 0.918121 | 0.958400 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 6 | XGBoost | 0.913649 | 0.913649 | 0.881748 | 0.955432 | 0.917112 | 0.956755 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 3 | K-Nearest Neighbors | 0.909471 | 0.909471 | 0.880829 | 0.947075 | 0.912752 | 0.943603 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 4 | Decision Tree | 0.903900 | 0.903900 | 0.873711 | 0.944290 | 0.907631 | 0.940542 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 2 | Support Vector Machine | 0.884401 | 0.884401 | 0.828571 | 0.969359 | 0.893453 | 0.916570 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 7 | MLP | 0.881616 | 0.881616 | 0.834146 | 0.952646 | 0.889467 | 0.941167 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 1 | Logistic Regression | 0.876045 | 0.876045 | 0.806818 | 0.988858 | 0.888611 | 0.905638 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
| 5 | Naive Bayes | 0.818942 | 0.818942 | 0.815427 | 0.824513 | 0.819945 | 0.874939 | C:\Users\hafez\MSU\Research\msGOM\mssound\bloo... |
Interpreting current results¶
- Best model (so far): Random Forest with Precision≈0.95 (non-hypoxic), Recall≈0.95 (hypoxic), F1≈0.92, ROC-AUC≈0.96 on the held-out test set.
- Balanced performance: high recall for the hypoxic class (minority) and strong overall AUC indicate good ranking and low miss-rate for events we care about.
- Next step: hyperparameter tuning the Random Forest to see if we can squeeze out extra F1/ROC without overfitting.
Visualizing model performance¶
- Compare classifiers side-by-side with bar plots (Accuracy, Balanced Accuracy, Precision, Recall, F1, ROC-AUC).
- After tuning, the updated table includes the tuned Random Forest row; the visualizations and diagnostics will pick the top F1 model (could be tuned RF).
- Inspect the best model with confusion matrix, ROC curve, and classification report to understand error types and ranking quality.
- Use feature importance and SHAP to see which predictors drive hypoxia risk (runs on the current best model).
# Visualize model metrics and inspect best model
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import ConfusionMatrixDisplay, RocCurveDisplay, classification_report
# Bar plot of metrics across models
metrics_to_plot = ['Accuracy', 'Balanced_Accuracy', 'Precision', 'Recall', 'F1', 'ROC_AUC']
plot_df = results_df.melt(id_vars=['Classifier'], value_vars=metrics_to_plot,
var_name='Metric', value_name='Score')
plt.figure(figsize=(12, 6))
sns.barplot(data=plot_df, x='Classifier', y='Score', hue='Metric')
plt.xticks(rotation=30, ha='right')
plt.ylim(0, 1)
plt.title('Model Performance Comparison')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
# Choose best model by F1
best_row = results_df.sort_values(by='F1', ascending=False).iloc[0]
best_model_path = best_row['Model_Path']
best_model_name = best_row['Classifier']
best_model = joblib.load(best_model_path)
# Predictions for diagnostics
y_pred = best_model.predict(X_test)
# Classification report
print(f"Best model: {best_model_name}\n")
print(classification_report(y_test, y_pred, digits=3, zero_division=0))
# Confusion matrix
ConfusionMatrixDisplay.from_predictions(y_test, y_pred, display_labels=['Non-hypoxic','Hypoxic'], cmap='Blues')
plt.title(f'Confusion Matrix: {best_model_name}')
plt.tight_layout()
plt.show()
# ROC curve (skip if no scores available)
y_scores = None
if hasattr(best_model, 'predict_proba'):
y_scores = best_model.predict_proba(X_test)[:, 1]
elif hasattr(best_model, 'decision_function'):
y_scores = best_model.decision_function(X_test)
if y_scores is not None:
RocCurveDisplay.from_predictions(y_test, y_scores, name=best_model_name)
plt.title(f'ROC Curve: {best_model_name}')
plt.tight_layout()
plt.show()
else:
print('ROC curve skipped: model does not provide scores/probabilities.')
Best model: Random Forest
precision recall f1-score support
0 0.949 0.877 0.912 359
1 0.886 0.953 0.918 359
accuracy 0.915 718
macro avg 0.917 0.915 0.915 718
weighted avg 0.917 0.915 0.915 718
Best Model Summary (Random Forest)¶
On the held‑out test set, the tuned Random Forest achieved accuracy = 0.915 with macro/weighted F1 ≈ 0.915.
- Class 0 (non‑hypoxic): precision 0.949, recall 0.877, F1 0.912 (n=359)
- Class 1 (hypoxic): precision 0.886, recall 0.953, F1 0.918 (n=359)
The high recall for the hypoxic class indicates a low miss‑rate for events of interest, while precision remains strong. We carry this model forward as the canonical best_model.pkl for raster inference.
Confusion Matrix¶
🔍A Wht is a false negative ? A false negative happens when the model says “no” but the truth is “yes.”
In your case: the water is hypoxic, but the model predicts non‑hypoxic.
It means the model missed a real event.
False negatives (17 cases): These are real hypoxic events the model missed. They tend to occur in cooler, biomass-rich waters with high chlorophyll‑a, POC, and NFLH, but weaker stratification. Spectrally, they show elevated green–red reflectance (Rrs), suggesting productive scenes where oxygen depletion isn’t as easily detected—biologically plausible misses.
🔍 What is a false positive? A false positive happens when the model says “yes” but the truth is “no.”
In your case: the water is not hypoxic, but the model predicts hypoxic.
It means the model raised an unnecessary alarm.
False positives (44 cases): These are non-hypoxic scenes the model flagged as hypoxic. They cluster in warm, organic-rich, moderately productive waters, which are ecologically prone to stratification and oxygen stress. The model’s cautious bias here makes biological sense, even if it occasionally over-predicts risk.
The model is highly reliable for identifying hypoxic conditions.¶
It has a very low miss‑rate for hypoxia (recall 0.953), which is the most important outcome for environmental monitoring.
Precision remains strong, so false alarms are not excessive.
Performance is balanced across both classes (F1 ≈ 0.91 for each), showing no major bias.
AUC (Area Under the ROC Curve) measures how well the model separates the two classes¶
- The model can correctly rank a randomly chosen hypoxic sample above a randomly chosen non‑hypoxic sample 96% of the time.
Part 4: Interpretation, Uncertainty Quantification & Diagnostics¶
We compare all models (including the tuned Random Forest), drill into the current best by F1 with confusion matrix, ROC curve, and a classification report, and then quantify uncertainty via bootstrap confidence intervals on test metrics.
Quantifying uncertainty¶
- Use bootstrap resampling of the held-out test set to estimate confidence intervals for key metrics (F1, recall, precision, ROC-AUC) of the current best model.
- This reflects variability due to finite test data; run after selecting the best model.
# Bootstrap uncertainty on test metrics for the best model
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score
# Requires best_model, X_test, y_test from previous cell
n_boot = 200
rng = np.random.default_rng(42)
has_scores = hasattr(best_model, 'predict_proba') or hasattr(best_model, 'decision_function')
boot_metrics = []
for _ in range(n_boot):
idx = rng.choice(len(y_test), size=len(y_test), replace=True)
Xb = X_test.iloc[idx] if hasattr(X_test, 'iloc') else X_test[idx]
yb = y_test.iloc[idx] if hasattr(y_test, 'iloc') else y_test[idx]
y_pred_b = best_model.predict(Xb)
if hasattr(best_model, 'predict_proba'):
y_score_b = best_model.predict_proba(Xb)[:, 1]
elif hasattr(best_model, 'decision_function'):
y_score_b = best_model.decision_function(Xb)
else:
y_score_b = None
prec = precision_score(yb, y_pred_b, zero_division=0)
rec = recall_score(yb, y_pred_b, zero_division=0)
f1 = f1_score(yb, y_pred_b, zero_division=0)
roc = roc_auc_score(yb, y_score_b) if y_score_b is not None else np.nan
boot_metrics.append([prec, rec, f1, roc])
boot_arr = np.array(boot_metrics)
quantiles = np.nanpercentile(boot_arr, [2.5, 50, 97.5], axis=0)
metric_names = ['Precision', 'Recall', 'F1', 'ROC_AUC']
print('Bootstrap 95% CIs (test set):')
for i, name in enumerate(metric_names):
lo, med, hi = quantiles[:, i]
print(f"{name}: median={med:.3f}, 95% CI=({lo:.3f}, {hi:.3f})")
Bootstrap 95% CIs (test set): Precision: median=0.885, 95% CI=(0.853, 0.916) Recall: median=0.953, 95% CI=(0.931, 0.971) F1: median=0.918, 95% CI=(0.894, 0.936) ROC_AUC: median=0.962, 95% CI=(0.946, 0.973)
Interpreting the bootstrap CIs
- Precision 0.885 (95% CI 0.853–0.916): most positive predictions are correct; tight band suggests stable precision.
- Recall 0.953 (95% CI 0.931–0.971): the model catches nearly all hypoxic events; high lower bound implies low miss-rate.
- F1 0.918 (95% CI 0.894–0.936): strong balance of precision/recall with modest uncertainty.
- ROC-AUC 0.962 (95% CI 0.946–0.973): ranking quality is excellent and robust; even the lower bound is high.
# Feature importance and SHAP for the best model
# Helper to grab the final estimator from a pipeline or standalone model
def get_final_estimator(model):
if hasattr(model, 'named_steps'):
# walk reversed to get last estimator-like step
for step in reversed(list(model.named_steps.values())):
if hasattr(step, 'feature_importances_') or hasattr(step, 'coef_'):
return step
return list(model.named_steps.values())[-1]
return model
final_est = get_final_estimator(best_model)
# Build a DataFrame of feature values (for plotting and SHAP)
if hasattr(X_test, 'columns'):
X_test_df = X_test.copy()
else:
X_test_df = pd.DataFrame(X_test, columns=feature_columns)
# Permutation/embedded feature importance
importance_df = None
if hasattr(final_est, 'feature_importances_'):
vals = final_est.feature_importances_
importance_df = pd.DataFrame({'feature': feature_columns, 'importance': vals})
elif hasattr(final_est, 'coef_'):
vals = np.abs(np.ravel(final_est.coef_))
importance_df = pd.DataFrame({'feature': feature_columns, 'importance': vals})
if importance_df is not None:
importance_df = importance_df.sort_values('importance', ascending=True)
plt.figure(figsize=(8, 6))
plt.barh(importance_df['feature'], importance_df['importance'])
plt.title(f'Feature importance: {best_model_name}')
plt.tight_layout()
plt.show()
else:
print('Feature importance not available for this model type.')
Model interpretation (feature importance & SHAP)¶
- Feature importance ranks inputs by their contribution to the best model’s predictions.
- SHAP values (if available) show direction and magnitude of each feature’s effect per prediction, helping explain individual risk drivers.
Precision-Recall and threshold tuning¶
- PR-AUC highlights performance on the minority (hypoxic) class and is more informative than ROC when positives are rare.
- We sweep thresholds on the best model’s scores to pick the operating point that maximizes F1 (or you can swap to maximize recall/PR-AUC).
# Precision-Recall curve, PR-AUC, and threshold sweep for the best model
from sklearn.metrics import precision_recall_curve, average_precision_score
# Ensure we have probability scores
if hasattr(best_model, 'predict_proba'):
y_score = best_model.predict_proba(X_test)[:, 1]
elif hasattr(best_model, 'decision_function'):
y_score = best_model.decision_function(X_test)
else:
raise ValueError('Best model does not expose decision scores or probabilities.')
# PR curve and PR-AUC (Average Precision)
precision_vals, recall_vals, thresh = precision_recall_curve(y_test, y_score)
pr_auc = average_precision_score(y_test, y_score)
# Sweep thresholds to maximize F1
f1_vals = 2 * (precision_vals * recall_vals) / (precision_vals + recall_vals + 1e-9)
best_idx = f1_vals.argmax()
best_threshold = thresh[best_idx - 1] if best_idx > 0 else thresh[0]
# Metrics at the chosen threshold
from sklearn.metrics import precision_score, recall_score, f1_score
y_pred_thresh = (y_score >= best_threshold).astype(int)
prec_t = precision_score(y_test, y_pred_thresh, zero_division=0)
rec_t = recall_score(y_test, y_pred_thresh, zero_division=0)
f1_t = f1_score(y_test, y_pred_thresh, zero_division=0)
print(f'PR-AUC (average precision): {pr_auc:.3f}')
print(f'Best threshold for F1: {best_threshold:.3f}')
print(f'Precision@thr: {prec_t:.3f}, Recall@thr: {rec_t:.3f}, F1@thr: {f1_t:.3f}')
# Plot PR curve with the chosen operating point
plt.figure(figsize=(8, 6))
plt.plot(recall_vals, precision_vals, label=f'PR curve (AP={pr_auc:.3f})')
plt.scatter(recall_vals[best_idx], precision_vals[best_idx], color='red', label='Chosen threshold')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title(f'Precision-Recall: {best_model_name}')
plt.legend()
plt.tight_layout()
plt.show()
PR-AUC (average precision): 0.950 Best threshold for F1: 0.513 Precision@thr: 0.886, Recall@thr: 0.953, F1@thr: 0.918
0.950 = Excellent — the model is trustworthy for hypoxia detection
Your current settings are production-ready: So, we can catch 95.3% of hypoxic zones while keeping false alarms low (11.4% false positive rate)
Probability calibration¶
- Reliability curves and Brier score show how well predicted probabilities align with observed frequencies.
- We fit a sigmoid calibration (Platt scaling) on the best model using cross-validation, then compare before/after calibration.
# Reliability curve (calibration) and Brier score
from sklearn.calibration import CalibratedClassifierCV, calibration_curve
from sklearn.metrics import brier_score_loss
from sklearn.base import clone
# Base probabilities
if hasattr(best_model, 'predict_proba'):
base_proba = best_model.predict_proba(X_test)[:, 1]
elif hasattr(best_model, 'decision_function'):
base_proba = best_model.decision_function(X_test)
else:
raise ValueError('Best model lacks probability/decision_function for calibration.')
brier_base = brier_score_loss(y_test, base_proba)
# Fit calibrated model (sigmoid/Platt scaling)
base_est = clone(best_model)
calibrator = CalibratedClassifierCV(base_est, method='sigmoid', cv=3)
calibrator.fit(X_train, y_train)
cal_proba = calibrator.predict_proba(X_test)[:, 1]
brier_cal = brier_score_loss(y_test, cal_proba)
print(f'Brier (uncalibrated): {brier_base:.4f}')
print(f'Brier (calibrated): {brier_cal:.4f}')
# Calibration curve
prob_true_base, prob_pred_base = calibration_curve(y_test, base_proba, n_bins=10)
prob_true_cal, prob_pred_cal = calibration_curve(y_test, cal_proba, n_bins=10)
plt.figure(figsize=(7, 7))
plt.plot([0, 1], [0, 1], 'k--', label='Perfectly calibrated')
plt.plot(prob_pred_base, prob_true_base, marker='o', label='Uncalibrated')
plt.plot(prob_pred_cal, prob_true_cal, marker='o', label='Calibrated (sigmoid)')
plt.xlabel('Predicted probability')
plt.ylabel('Observed frequency')
plt.title(f'Reliability diagram: {best_model_name}')
plt.legend()
plt.tight_layout()
plt.show()
# Save calibrated model for downstream use
calibrated_model_path = os.path.join(models_directory, f'{best_model_name}_calibrated.pkl')
joblib.dump(calibrator, calibrated_model_path)
print('Saved calibrated model to:', calibrated_model_path)
Brier (uncalibrated): 0.0651 Brier (calibrated): 0.0681
Saved calibrated model to: C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\models1\Random Forest_calibrated.pkl
Permutation importance and PDP/ICE¶
- Permutation importance measures drop in performance when shuffling each feature; model-agnostic and reliable across estimators.
- Partial dependence (PDP) and individual conditional expectation (ICE) show how predicted hypoxia risk changes with a feature, holding others marginally fixed.
# Model-agnostic permutation importance and PDP/ICE plots
from sklearn.inspection import permutation_importance, PartialDependenceDisplay
# Permutation importance on the held-out test set
perm = permutation_importance(
best_model,
X_test,
y_test,
n_repeats=25,
random_state=42,
n_jobs=-1,
scoring='f1',
)
perm_df = pd.DataFrame({
'feature': feature_columns,
'importance_mean': perm.importances_mean,
'importance_std': perm.importances_std,
}).sort_values('importance_mean', ascending=True)
plt.figure(figsize=(8, 6))
plt.barh(perm_df['feature'], perm_df['importance_mean'], xerr=perm_df['importance_std'])
plt.title(f'Permutation importance (F1 loss): {best_model_name}')
plt.xlabel('Mean decrease in F1 when shuffled')
plt.tight_layout()
plt.show()
# Pick top features for PDP/ICE
top_features = perm_df.sort_values('importance_mean', ascending=False)['feature'].head(3).tolist()
print('Top features for PDP/ICE:', top_features)
fig, ax = plt.subplots(len(top_features), 1, figsize=(8, 4 * len(top_features)))
PartialDependenceDisplay.from_estimator(
best_model,
X_test,
features=top_features,
kind='both', # PDP + ICE
grid_resolution=30,
ax=ax,
)
plt.tight_layout()
plt.show()
Top features for PDP/ICE: ['chlor_a', 'Rrs_488', 'Rrs_469']
Partial Dependence Plot - Chlorophyll Effect¶
# Partial Dependence and ICE for chlor_a
from sklearn.inspection import PartialDependenceDisplay
# Build a DataFrame with feature names if needed
if hasattr(X_test, 'columns'):
X_pd = X_test
else:
X_pd = pd.DataFrame(X_test, columns=feature_columns)
feature_name = 'chlor_a'
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
# Average Partial Dependence
PartialDependenceDisplay.from_estimator(
best_model, X_pd, [feature_name], ax=ax[0], kind='average', grid_resolution=50
)
ax[0].set_title('Partial Dependence: chlor_a')
ax[0].set_xlabel('chlor_a')
ax[0].set_ylabel('Predicted hypoxia probability')
# ICE (individual curves)
PartialDependenceDisplay.from_estimator(
best_model, X_pd, [feature_name], ax=ax[1], kind='individual', subsample=200, grid_resolution=50
)
ax[1].set_title('ICE: chlor_a (subsample=200)')
ax[1].set_xlabel('chlor_a')
ax[1].set_ylabel('Predicted hypoxia probability')
plt.tight_layout()
plt.show()
ROC Curves¶
Visualize the binary classification performance of the best model across thresholds using the ROC curve and AUC.
# ROC curve for the best model
import os
import joblib
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, roc_auc_score
# Ensure X_test and y_test exist; ensure we have a model
try:
best_model
except NameError:
# Load canonical best model from disk
models_directory = r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\models1'
best_model_path = os.path.join(models_directory, 'best_model.pkl')
best_model = joblib.load(best_model_path)
# Get scores for ROC (probabilities or decision function)
if hasattr(best_model, 'predict_proba'):
y_score = best_model.predict_proba(X_test)[:, 1]
elif hasattr(best_model, 'decision_function'):
y_score = best_model.decision_function(X_test)
# Normalize decision function to [0, 1] range for plotting
y_score = (y_score - y_score.min()) / (y_score.max() - y_score.min() + 1e-9)
else:
raise ValueError('Best model does not expose probabilities or decision scores.')
fpr, tpr, _ = roc_curve(y_test, y_score)
auc_val = roc_auc_score(y_test, y_score)
plt.figure(figsize=(7, 6))
plt.plot(fpr, tpr, label=f'Best model ROC (AUC={auc_val:.3f})', color='#2563eb', linewidth=2)
plt.plot([0, 1], [0, 1], '--', color='gray', label='Chance')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Recall)')
plt.title('ROC Curve — Hypoxia Classification')
plt.legend(loc='lower right')
plt.grid(alpha=0.2)
plt.tight_layout()
plt.show()
Multi-Model ROC Comparison¶
Overlay ROC curves for the base Random Forest, tuned Random Forest, and a calibrated model (if present) to compare ranking quality across thresholds.
# Overlay ROC curves for base RF, tuned RF, and calibrated model (if available)
import os
import joblib
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, roc_auc_score
# Require test data
if 'X_test' not in globals() or 'y_test' not in globals():
print('⚠️ X_test/y_test not found. Please run the Train/Test Split cell first.')
else:
models_directory = r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\models1'
candidates = {
'Base RF': os.path.join(models_directory, 'Random Forest.pkl'),
'Tuned RF': os.path.join(models_directory, 'RandomForest_tuned.pkl'),
'Calibrated': None,
}
# Try several calibrated filenames
for cand in ['best_model_calibrated.pkl', 'RandomForest_tuned_calibrated.pkl', 'Random Forest_calibrated.pkl']:
path_c = os.path.join(models_directory, cand)
if os.path.exists(path_c):
candidates['Calibrated'] = path_c
break
curves = {}
for name, path in candidates.items():
if path is None:
print(f'ℹ️ {name} not found; skipping.')
continue
if not os.path.exists(path):
print(f'ℹ️ {name} file missing: {path}; skipping.')
continue
try:
model = joblib.load(path)
if hasattr(model, 'predict_proba'):
y_score = model.predict_proba(X_test)[:, 1]
elif hasattr(model, 'decision_function'):
y_score = model.decision_function(X_test)
y_score = (y_score - y_score.min()) / (y_score.max() - y_score.min() + 1e-9)
else:
print(f'ℹ️ {name} has no probability/score method; skipping.')
continue
fpr, tpr, _ = roc_curve(y_test, y_score)
auc_val = roc_auc_score(y_test, y_score)
curves[name] = (fpr, tpr, auc_val)
except Exception as e:
print(f'⚠️ Could not evaluate {name}: {e}')
# Plot
if curves:
plt.figure(figsize=(7.5, 6.5))
palette = {
'Base RF': '#374151',
'Tuned RF': '#2563eb',
'Calibrated': '#059669',
}
for name, (fpr, tpr, auc_val) in curves.items():
color = palette.get(name, None)
plt.plot(fpr, tpr, label=f'{name} (AUC={auc_val:.3f})', linewidth=2, color=color)
plt.plot([0, 1], [0, 1], '--', color='gray', label='Chance')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Recall)')
plt.title('ROC Comparison — Hypoxia Classification')
plt.legend(loc='lower right')
plt.grid(alpha=0.2)
plt.tight_layout()
plt.show()
else:
print('⚠️ No curves to plot; check that model files exist in models directory.')
Why Chlorophyll‑a, POC, and SST are sensitive to hypoxia¶
- Chlorophyll‑a (biomass proxy): Elevated chlor_a signals phytoplankton blooms. When blooms decay, microbial decomposition consumes dissolved oxygen, often triggering or intensifying hypoxic conditions—especially in stratified waters.
- Particulate Organic Carbon (POC): POC represents exportable organic matter. As it sinks, benthic and pelagic respiration oxidize this carbon, increasing biochemical oxygen demand (BOD) and depleting DO in bottom layers.
- Sea Surface Temperature (SST): Warmer SST strengthens stratification (reduced vertical mixing), limiting oxygen resupply to deeper waters. Higher temperatures also accelerate microbial respiration, compounding oxygen loss.
- Combined effect: High chlor_a + elevated POC under warm, stratified conditions markedly increases hypoxia risk. These variables are physically linked to the mechanisms that reduce DO, which is why they appear highly informative in feature importance and PDP/ICE analyses.
Partial Dependence Plot: Chlorophyll (chlor_a)¶
This plot shows the marginal effect of chlorophyll concentration on the model's predicted hypoxia risk, holding other features at their observed values. We include:
- Partial Dependence (PD): Average effect across samples (left panel)
- Individual Conditional Expectation (ICE): Per-sample trajectories (right panel)
📊 How to Interpret Partial Dependence Plots¶
What is a Partial Dependence Plot?¶
A PDP isolates the relationship between a single feature (e.g., chlorophyll) and the model's predicted outcome (hypoxia probability), averaging out the effects of all other features. It answers: "How does changing chlorophyll alone affect hypoxia risk, on average?"
Key Components:¶
1. X-axis (Feature Values)
- Shows the range of chlorophyll concentrations in your data
- Units: mg/m³ (scaled by MinMaxScaler during training)
2. Y-axis (Predicted Probability)
- Model's predicted hypoxia probability (0–1 scale)
- Higher values = greater risk of hypoxia (DO < 2 mg/L)
3. Average Partial Dependence Line (Yellow/Blue curve)
- Represents the average effect across all samples
- Smooth curve = model learned a continuous relationship
- Jagged curve = model is sensitive to specific thresholds
Reading the Patterns:¶
📈 Upward Trend (Positive Effect)¶
Low chlor_a → Low hypoxia risk
High chlor_a → High hypoxia risk
Interpretation: Algal blooms (high chlorophyll) → decomposition consumes oxygen → hypoxia
Example: If chlor_a increases from 1 to 10 mg/m³, hypoxia probability rises from 0.2 to 0.8
📉 Downward Trend (Negative Effect)¶
High feature value → Low hypoxia risk
Interpretation: Feature acts as a protective factor (rare for chlorophyll in this context)
➡️ Flat Line (No Effect)¶
Horizontal line at any feature value
Interpretation: Feature does not influence predictions; model ignores it
🔄 Nonlinear Relationships¶
- S-shaped curve: Threshold effect (e.g., hypoxia risk jumps above 5 mg/m³ chlorophyll)
- Plateau: Saturation effect (additional chlorophyll has no extra impact beyond a certain point)
- U-shaped: Optimal range exists (both low and high values increase risk)
ICE Plots (Individual Conditional Expectation)¶
What are they?
Each thin line represents one sample's response curve. They show heterogeneity in how chlorophyll affects different locations/seasons.
Parallel ICE Lines:¶
All lines move in same direction with similar slopes
Interpretation: Chlorophyll effect is consistent across contexts (homogeneous effect)
Fanning ICE Lines:¶
Lines diverge at high chlorophyll values
Interpretation: Chlorophyll effect varies by context:
- Coastal regions may respond more strongly than deep water
- Summer may amplify the effect vs. winter
Crossing ICE Lines:¶
Some lines go up, others go down
Interpretation: Strong interactions with other features:
- High temperature + high chlorophyll = severe hypoxia
- Low temperature + high chlorophyll = mild impact
Example Interpretation for Your Plot:¶
Scenario: PD curve shows:
- Chlor_a < 2 mg/m³ → Hypoxia prob ≈ 0.1 (low risk)
- Chlor_a = 5 mg/m³ → Hypoxia prob ≈ 0.5 (moderate risk, threshold crossed)
- Chlor_a > 10 mg/m³ → Hypoxia prob ≈ 0.8 (high risk, plateau effect)
ICE lines: Fan out at chlor_a > 5 mg/m³
Conclusion:
- Chlorophyll is a strong hypoxia driver (steep positive slope)
- Critical threshold at ~5 mg/m³ (inflection point where risk accelerates)
- Context matters (fanning ICE = location/season interactions)
- Diminishing returns at extremes (plateau at high chlor_a = other factors limit further risk increase)
Actionable Insights:¶
✅ For Monitoring: Flag regions with chlor_a > 5 mg/m³ for increased hypoxia surveillance
✅ For Policy: Target nutrient reduction to keep chlorophyll below threshold
✅ For Forecasting: Combine chlorophyll with SST/stratification for early warning systems
✅ For Research: Investigate ICE heterogeneity—why do some locations resist high-chlorophyll hypoxia?
Caution:¶
- PDP assumes independence (changing chlorophyll alone may be unrealistic—often correlated with temperature/nutrients)
- PDP shows correlation, not causation (model learned associations, not mechanisms)
- PDP averages over other features—may hide important interaction effects (use SHAP for instance-level explanations)
Error analysis (FP/FN patterns)¶
- Inspect false negatives (missed hypoxia) and false positives to see feature patterns driving mistakes.
- Summaries below highlight average feature values for each error type vs correct predictions.
# Summarize error patterns for the best model
from sklearn.metrics import confusion_matrix
# Use calibrated probabilities if available, otherwise raw
try:
y_score_err = cal_proba
except NameError:
if hasattr(best_model, 'predict_proba'):
y_score_err = best_model.predict_proba(X_test)[:, 1]
elif hasattr(best_model, 'decision_function'):
y_score_err = best_model.decision_function(X_test)
else:
y_score_err = None
y_pred_err = best_model.predict(X_test)
cm = confusion_matrix(y_test, y_pred_err)
print('Confusion matrix (rows=true, cols=pred):\n', cm)
# Build a small analysis table
err_df = X_test_df.copy()
err_df['true'] = y_test.values
err_df['pred'] = y_pred_err
err_df['score'] = y_score_err if y_score_err is not None else np.nan
false_neg = err_df[(err_df['true'] == 1) & (err_df['pred'] == 0)]
false_pos = err_df[(err_df['true'] == 0) & (err_df['pred'] == 1)]
true_pos = err_df[(err_df['true'] == 1) & (err_df['pred'] == 1)]
true_neg = err_df[(err_df['true'] == 0) & (err_df['pred'] == 0)]
print(f'Counts — TP: {len(true_pos)}, FP: {len(false_pos)}, FN: {len(false_neg)}, TN: {len(true_neg)}')
# Compare feature means for FN vs TP (where we most care)
if len(false_neg) > 0 and len(true_pos) > 0:
fn_tp_means = pd.DataFrame({
'FN_mean': false_neg[feature_columns].mean(),
'TP_mean': true_pos[feature_columns].mean(),
}).sort_values('FN_mean', ascending=False)
print('\nFeature means (FN vs TP) — watch for large gaps that may indicate miscalibration or missing interactions:')
print(fn_tp_means.head(8))
else:
print('No false negatives to summarize.')
# Quick glimpse of high-score FPs (if any)
if len(false_pos) > 0:
top_fp = false_pos.sort_values('score', ascending=False).head(5)[feature_columns + ['score']]
print('\nTop-scoring false positives (feature snapshot):')
print(top_fp)
else:
print('No false positives to inspect.')
Confusion matrix (rows=true, cols=pred):
[[315 44]
[ 17 342]]
Counts — TP: 342, FP: 44, FN: 17, TN: 315
Feature means (FN vs TP) — watch for large gaps that may indicate miscalibration or missing interactions:
FN_mean TP_mean
poc 527.646083 496.774446
sst 29.315045 29.798069
chlor_a 10.065308 9.160466
nflh 0.346490 0.259299
Rrs_547 0.009437 0.007357
Rrs_555 0.009366 0.007219
Rrs_531 0.008525 0.006824
Rrs_645 0.006665 0.003963
Top-scoring false positives (feature snapshot):
chlor_a nflh poc sst Rrs_412 Rrs_443 \
99155 2.844585 0.286568 336.566650 30.149412 0.002204 0.002776
99558 7.602374 0.164738 489.479981 29.863333 0.001752 0.002287
30943 12.548010 0.135143 534.666687 31.475262 0.003642 0.004211
31083 12.548010 0.135143 534.666687 31.475262 0.003642 0.004211
30914 12.548010 0.135143 534.666687 31.475262 0.003642 0.004211
Rrs_469 Rrs_488 Rrs_531 Rrs_547 Rrs_555 Rrs_645 Rrs_667 \
99155 0.003416 0.003844 0.004506 0.004420 0.004229 0.001052 0.000934
99558 0.002890 0.003317 0.004467 0.004649 0.004542 0.001565 0.001382
30943 0.005196 0.005676 0.009309 0.010499 0.010354 0.007108 0.006409
31083 0.005196 0.005676 0.009309 0.010499 0.010354 0.007108 0.006409
30914 0.005196 0.005676 0.009309 0.010499 0.010354 0.007108 0.006409
Rrs_678 score
99155 0.001045 0.942652
99558 0.001445 0.937517
30943 0.006021 0.937316
31083 0.006021 0.937316
30914 0.006021 0.937316
Interpretation of Error analysis¶
The confusion matrix indicates strong hypoxia detection with recall ≈95% (342/359) and non‑hypoxic recall ≈88% (315/359), producing overall accuracy ≈91.5%. Precision for the hypoxic class is ≈88.6% (342/(342+44)), so some false alarms occur relative to true detections. False negatives show slightly higher chlorophyll‑a, POC, and NFLH but slightly lower SST than true positives, suggesting cases with substantial biomass/organic load under weaker stratification that the model misses. Elevated Rrs (531/547/555/645) in false negatives versus true positives points to spectral contexts where interactions or calibration may be under-modeled. Top false positives arise in warm (≈30–31.5°C), high‑POC scenes with moderate‑to‑high chlor_a, implying the model leans toward risk in stratified, organic‑rich waters; probability calibration or threshold tuning via precision–recall can reduce these false alarms while preserving high hypoxia recall.
False negatives happen in cool, biomass‑rich but weakly stratified waters, where high chlorophyll/POC mimic hypoxic spectra without true oxygen loss.
False positives occur in warm, organic‑rich, productive waters—conditions that genuinely favor stratification—so the model reasonably errs on the side of caution.
Cross-validation robustness check¶
- Stratified k-fold CV on the balanced dataset to confirm stability of F1, balanced accuracy, and ROC-AUC beyond a single split.
# Stratified k-fold CV metrics for robustness
from sklearn.model_selection import StratifiedKFold, cross_val_score
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scoring = {
'f1': 'f1',
'balanced_accuracy': 'balanced_accuracy',
'roc_auc': 'roc_auc',
}
cv_scores = {}
for metric_name, metric in scoring.items():
scores = cross_val_score(best_model, X, y, cv=cv, scoring=metric, n_jobs=-1)
cv_scores[metric_name] = scores
print(f'{metric_name}: mean={scores.mean():.3f}, std={scores.std():.3f}, min={scores.min():.3f}, max={scores.max():.3f}')
# Quick table
cv_df = pd.DataFrame(cv_scores)
cv_df
f1: mean=0.919, std=0.006, min=0.912, max=0.928 balanced_accuracy: mean=0.916, std=0.006, min=0.908, max=0.925 roc_auc: mean=0.960, std=0.003, min=0.955, max=0.963
| f1 | balanced_accuracy | roc_auc | |
|---|---|---|---|
| 0 | 0.920892 | 0.918523 | 0.960992 |
| 1 | 0.928000 | 0.924939 | 0.963302 |
| 2 | 0.911647 | 0.907950 | 0.955025 |
| 3 | 0.914513 | 0.910042 | 0.961739 |
| 4 | 0.919028 | 0.916318 | 0.961415 |
Part 5: Spatial Prediction¶
Spatial prediction across satellite imagery¶
Using the trained best model, we can now generate hypoxia risk maps by applying predictions to MODIS satellite raster files. This section:
- Loads multi-band GeoTIFF rasters from the MODIS output folder
- Reshapes and normalizes band data using the fitted scaler
- Applies the best model to generate pixel-level predictions
- Saves predictions as georeferenced GeoTIFF files preserving spatial metadata
Output: Hypoxia risk predictions for each pixel across the Gulf of Mexico study area.
# Apply trained model to satellite raster files for spatial prediction
# Input raster files (MODIS GeoTIFFs with 14 bands)
# files D:\my works\bloom\data\MODIS
rasters=glob.glob(r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\modis out\*.tif')
# path
output_path=r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\Predicted rast"
os.makedirs(output_path, exist_ok=True)
# Load the fitted scaler used during model training
scaler_path = os.path.join(models_directory, 'MinMaxScaler.pkl')
scaler = joblib.load(scaler_path)
print(f'✓ Loaded fitted scaler from: {scaler_path}')
print(f'Found {len(rasters)} raster files to process.')
# Process each raster
for raster in rasters:
print(f'\nProcessing: {os.path.basename(raster)}')
with rasterio.open(raster) as src:
# Read all bands into a numpy array (bands, height, width)
array = src.read()
# Capture spatial metadata for output
meta = src.meta
print(f' Raster shape (bands, height, width): {array.shape}')
# Stack bands into (height, width, bands) format
array_stack = np.stack(array, axis=-1)
print(f' Stacked shape (height, width, bands): {array_stack.shape}')
# Reshape to 2D for prediction: (n_pixels, n_bands)
array_reshaped = array_stack.reshape(-1, array_stack.shape[-1])
print(f' Reshaped for model (n_pixels, n_bands): {array_reshaped.shape}')
# Create DataFrame with proper feature names
df_array = pd.DataFrame(
array_reshaped,
columns=['chlor_a', 'nflh', 'poc', 'sst', 'Rrs_412',
'Rrs_443', 'Rrs_469', 'Rrs_488', 'Rrs_531', 'Rrs_547', 'Rrs_555',
'Rrs_645', 'Rrs_667', 'Rrs_678']
)
print(f' DataFrame shape: {df_array.shape}')
print(f' First 2 rows:\n{df_array.head(2)}')
# Normalize features using the same scaler used during training
df_array_normalized = scaler.transform(df_array)
print(f' Normalized shape: {df_array_normalized.shape}')
# Load the best trained model
best_model_path = os.path.join(models_directory, 'best_model.pkl')
model = joblib.load(best_model_path)
print(f' Loaded model from: {best_model_path}')
# Generate predictions for all pixels
predictions = model.predict(df_array_normalized)
print(f' Predictions shape: {predictions.shape}')
# Reshape predictions back to 2D raster (height, width)
predictions_2d = predictions.reshape(array_stack.shape[0], array_stack.shape[1])
print(f' Predictions 2D shape (height, width): {predictions_2d.shape}')
# Update metadata for single-band output raster
meta.update(count=1, dtype=rasterio.uint8, height=predictions_2d.shape[0], width=predictions_2d.shape[1])
# Save prediction as GeoTIFF with spatial metadata
output_filename = os.path.splitext(os.path.basename(raster))[0] + '_hypoxia_pred.tif'
output_file = os.path.join(output_path, output_filename)
with rasterio.open(output_file, 'w', **meta) as dst:
dst.write(predictions_2d.astype(rasterio.uint8), 1)
print(f' ✓ Saved prediction to: {output_file}')
# Clear memory
del array, array_stack, array_reshaped, df_array, df_array_normalized, predictions, predictions_2d
print(f'\n✓ Raster prediction complete! All outputs saved to: {output_path}')
Part 7: Temporal Trend Analysis & Visualization¶
Multi-Scale Temporal Dynamics of Predicted Hypoxia¶
This section synthesizes spatial quantification results into publication-ready visualizations that reveal temporal patterns in hypoxia zone extent. Through monthly, seasonal, and annual analyses, we identify:
- Seasonal Cycles: When does hypoxia peak? (typically summer in Gulf of Mexico)
- Interannual Variability: How much does the dead zone size vary year-to-year?
- Extreme Events: Are there anomalous years or hurricane-driven changes?
- Trend Analysis: Is the long-term trajectory increasing or decreasing?
Workflow Overview¶
- Load Area Time-Series: Read
area.csvcontaining hypoxia zone area (km²) per satellite acquisition date - Extract Temporal Features: Parse dates; group by month, season, and year
- Generate Multi-Panel Figures:
- (a) Monthly Boxplot: Shows distribution per calendar month, revealing seasonal phase
- (b) Seasonal Violin Plot: Aggregates 3-month seasons with individual observations for pattern robustness
- (c) Time Series with Trend: Full temporal trajectory with regression line, mean baseline, and optional hurricane markers
- Statistical Summary: Compute descriptive statistics (mean, std, min, max) per month/season
- Cross-Validation with In-Situ: Compare satellite-derived extents against independent water quality buoys (if available)
Key Outputs¶
- Publication-ready figures with professional styling (fonts, colors, annotations)
- Numerical summaries of seasonal/annual hypoxia magnitude
- Anomaly detection: Years or months deviating significantly from baseline
- Trend confidence intervals: Bootstrap-based uncertainty on slope estimates
Temporal Trend Analysis: Multi-Scale Visualization of Predicted Hypoxia Zone Extent¶
This section analyzes the temporal dynamics of predicted hypoxia zones through multiple complementary visualizations:
- Monthly Boxplot (a): Distribution of hypoxia area by calendar month, revealing seasonal patterns and variability
- Seasonal Violin Plot (b): Aggregated analysis across seasons (Spring, Summer, Winter), accounting for interannual variability with individual data points overlaid
- Time Series with Trends (c): Full temporal trajectory with linear regression trend line and mean baseline for identifying long-term patterns
The workflow loads the area quantification results from area.csv, filters for hypoxic class data, and generates publication-quality multi-panel figures with statistical summaries.
import re
hurrican = pd.read_csv(r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\Hurricane.csv')
# Convert hurricane Date strings to datetime (supports formats like 'May-2010' or '01-May-2010')
def convert_date(date):
for fmt in ('%b-%Y', '%d-%b-%Y'):
try:
return pd.to_datetime(date, format=fmt)
except ValueError:
continue
return pd.NaT # Return not-a-time for invalid dates
hurrican['Date'] = hurrican['Date'].apply(convert_date)
path = r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\Predicted rast"
# read csv
df_area = pd.read_csv(os.path.join(path, 'area.csv'))
# Helper to parse file names like 'SMI2010_05' into timestamps (defaults to day=1)
def parse_file_date(val: str):
# First, try direct parsing (handles ISO/mixed)
dt = pd.to_datetime(val, errors='coerce', format=None)
if pd.notnull(dt):
return dt
# Fallback: extract YYYY and MM from patterns like SMI2010_05 or 2010-05
m = re.search(r'(\d{4})[_-]?(\d{2})', str(val))
if m:
year = int(m.group(1))
month = int(m.group(2))
return pd.Timestamp(year=year, month=month, day=1)
return pd.NaT
# Select hypoxia class and parse dates
df_hypo = df_area[['file', 'class_1_area_sqkm']].copy()
df_hypo['file_dt'] = df_hypo['file'].apply(parse_file_date)
# Drop rows where date could not be parsed
df_hypo = df_hypo.dropna(subset=['file_dt'])
# Sort by parsed date
df_hypo.sort_values(by='file_dt', inplace=True)
# Temporal features
df_hypo['month'] = df_hypo['file_dt'].dt.month
df_hypo['season'] = df_hypo['month'] % 12 + 1
df_hypo['season'] = df_hypo['season'].apply(
lambda x: 'Spring' if 3 <= x <= 5 else (
'Summer' if 6 <= x <= 8 else (
'Winter' if x == 12 or x <= 2 else None)))
# Remove rows with season as None (Autumn months)
df_hypo = df_hypo[df_hypo['season'].notna()]
palette = sns.color_palette("hsv", 12)
# Create a figure with three subplots
fig, (ax3, ax2, ax1) = plt.subplots(3, 1, figsize=(15, 15))
# Monthly boxplot
sns.boxplot(x='month', y='class_1_area_sqkm', data=df_hypo, ax=ax3, palette=palette)
ax3.set_ylabel('Predicted Hypoxic area over sqkm')
ax3.set_xlabel('Month')
ax3.annotate('(a)', xy=(0.01, 0.95), xycoords='axes fraction', fontsize=12,
horizontalalignment='left', verticalalignment='top')
# Seasonal violin + swarm
sns.violinplot(x='season', y='class_1_area_sqkm', data=df_hypo, ax=ax2, palette='Set3', inner=None)
sns.swarmplot(x='season', y='class_1_area_sqkm', data=df_hypo, ax=ax2, color='k', size=3)
ax2.set_ylabel('Predicted Hypoxic area over sqkm')
ax2.annotate('(b)', xy=(0.01, 0.95), xycoords='axes fraction', fontsize=12,
horizontalalignment='left', verticalalignment='top')
# Time series with trend and optional hurricane markers
sns.lineplot(x=df_hypo['file_dt'], y=df_hypo['class_1_area_sqkm'], ax=ax1, color='blue')
if 'class_1_area_sqkm' in hurrican.columns:
sns.scatterplot(x=hurrican['Date'], y=hurrican['class_1_area_sqkm'], color='red', s=100,
ax=ax1, label='Hurricane Events')
ax1.set_xlim(df_hypo['file_dt'].min(), df_hypo['file_dt'].max())
z = np.polyfit(range(len(df_hypo)), df_hypo['class_1_area_sqkm'], 1)
p = np.poly1d(z)
ax1.plot(df_hypo['file_dt'], p(range(len(df_hypo))), "r--", label='Trend line')
mean_val = df_hypo['class_1_area_sqkm'].mean()
ax1.axhline(mean_val, color='gray', linestyle='dotted', label='Averaged line')
ax1.set_ylabel('Predicted Hypoxic area over sqkm')
ax1.set_xlabel('')
ax1.annotate('(c)', xy=(0.01, 0.95), xycoords='axes fraction', fontsize=12,
horizontalalignment='left', verticalalignment='top')
ax1.legend(loc='upper center')
plt.tight_layout()
# plt.savefig(r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\figure\seasonal_analysis_v3.jpg', dpi=600, bbox_inches='tight')
plt.show()
C:\Users\hafez\AppData\Local\Temp\ipykernel_59372\4094274612.py:58: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.boxplot(x='month', y='class_1_area_sqkm', data=df_hypo, ax=ax3, palette=palette) C:\Users\hafez\AppData\Local\Temp\ipykernel_59372\4094274612.py:58: UserWarning: The palette list has more values (12) than needed (9), which may not be intended. sns.boxplot(x='month', y='class_1_area_sqkm', data=df_hypo, ax=ax3, palette=palette) C:\Users\hafez\AppData\Local\Temp\ipykernel_59372\4094274612.py:65: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.violinplot(x='season', y='class_1_area_sqkm', data=df_hypo, ax=ax2, palette='Set3', inner=None)
Interpreting the prediction rasters¶
- Pixel values: 1 = predicted hypoxic (DO < 2 mg/L), 0 = predicted non-hypoxic (DO ≥ 2 mg/L)
- Spatial coverage: One prediction per pixel covering the study area at 1 km resolution
- Temporal: Each raster corresponds to a specific MODIS acquisition date; stack across time to assess seasonal/interannual patterns
- Uncertainty: Predictions are binary; overlay with model probability maps (from calibrated model) for confidence assessment
Next steps for analysis:¶
- Stack prediction rasters temporally to track hypoxia zone extent over time
- Compute hypoxia area extent (% of study area hypoxic) per month/season
- Cross-validate predictions with in-situ dissolved oxygen measurements where available
- Analyze spatial persistence: how long do hypoxic regions persist?
Part 6: Uncertainty quantification¶
Mathematical Foundation: Uncertainty Quantification Methods¶
Overview¶
This notebook quantifies prediction uncertainty using probabilistic outputs from ensemble classifiers. Rather than binary 0/1 predictions, we generate calibrated probability estimates that reflect model confidence in each classification decision.
1. Probability Estimation via predict_proba()¶
For a trained classifier (e.g., Random Forest), the predict_proba() method estimates the posterior probability $P(Y=1 | \mathbf{x})$ that a pixel with features $\mathbf{x}$ belongs to the hypoxia class (class 1).
Random Forest Probability Mechanism¶
A Random Forest with $N$ decision trees produces predictions by majority voting:
$$ P(Y=1 | \mathbf{x}) = \frac{1}{N} \sum_{i=1}^{N} \mathbb{1}(T_i(\mathbf{x}) = 1) $$
where:
- $T_i(\mathbf{x})$ = prediction of tree $i$ for input $\mathbf{x}$
- $\mathbb{1}(\cdot)$ = indicator function (1 if true, 0 otherwise)
- $N$ = number of trees in the ensemble (e.g., 100-300 trees)
Interpretation:
- $P(Y=1 | \mathbf{x}) = 0.95$ means 95% of trees voted "hypoxic"
- $P(Y=1 | \mathbf{x}) = 0.52$ means only 52% of trees voted "hypoxic" (low confidence, near decision boundary)
Code Implementation¶
# Get probability of hypoxia (class 1) for each pixel
probabilities = model.predict_proba(df_array_normalized)[:, 1]
The [:, 1] indexing extracts the second column (class 1 = hypoxia). For binary classification:
[:, 0]= $P(Y=0 | \mathbf{x})$ (non-hypoxic)[:, 1]= $P(Y=1 | \mathbf{x})$ (hypoxic)
By definition: $P(Y=0 | \mathbf{x}) + P(Y=1 | \mathbf{x}) = 1$
2. Probability Calibration¶
Problem: Raw Random Forest probabilities can be miscalibrated (e.g., 70% probability doesn't mean 70% of such predictions are correct).
Solution: Apply Platt scaling (logistic calibration) to align predicted probabilities with empirical frequencies.
Platt Scaling Equation¶
Fit a logistic regression on out-of-fold predictions:
$$ P_{\text{calibrated}}(Y=1 | \mathbf{x}) = \frac{1}{1 + \exp(A \cdot f(\mathbf{x}) + B)} $$
where:
- $f(\mathbf{x})$ = raw model output (e.g., uncalibrated probability or decision function)
- $A, B$ = parameters fitted via cross-validation to minimize log-loss
Training Process:
from sklearn.calibration import CalibratedClassifierCV
# Fit calibrator on validation set using 3-fold CV
calibrated_model = CalibratedClassifierCV(
base_model, method='sigmoid', cv=3
)
calibrated_model.fit(X_train, y_train)
# Get calibrated probabilities
probabilities = calibrated_model.predict_proba(X_test)[:, 1]
Effect: Calibrated probabilities better match true event rates (e.g., if model assigns 80% probability to 100 pixels, ~80 should be hypoxic).
3. Uncertainty Quantification via Shannon Entropy¶
Shannon entropy measures prediction uncertainty. For binary classification:
$$ H(p) = -p \log_2(p) - (1 - p) \log_2(1 - p) $$
where:
- $p = P(Y=1 | \mathbf{x})$ = predicted probability of hypoxia
- $H(p) \in [0, 1]$ bits (binary entropy range)
Entropy Properties¶
| Probability $p$ | Entropy $H(p)$ | Interpretation |
|---|---|---|
| $p = 0.0$ or $1.0$ | $H = 0$ | Maximum confidence (certain non-hypoxic or hypoxic) |
| $p = 0.5$ | $H = 1$ | Maximum uncertainty (equiprobable classes, decision boundary) |
| $p = 0.8$ or $0.2$ | $H \approx 0.72$ | Moderate confidence |
Code Implementation¶
# Compute Shannon entropy for each pixel
entropy = -probabilities * np.log2(probabilities + 1e-9) \
- (1 - probabilities) * np.log2(1 - probabilities + 1e-9)
# Identify high-confidence pixels (low entropy)
high_confidence = (probabilities > 0.8) | (probabilities < 0.2)
pct_confident = 100 * high_confidence.sum() / high_confidence.size
Small constant $1e-9$: Prevents $\log(0)$ numerical errors (negligible impact on entropy).
4. Uncertainty Raster Encoding¶
To save probabilities as GeoTIFF rasters (efficient uint8 format):
$$ \text{pixel}_{\text{uint8}} = \lfloor 255 \times p \rfloor $$
where:
- $p \in [0, 1]$ = probability
- $\text{pixel}_{\text{uint8}} \in \{0, 1, \ldots, 255\}$
Decoding for visualization:
$$ p = \frac{\text{pixel}_{\text{uint8}}}{255} $$
Interpretation Scale¶
| Pixel Value (uint8) | Probability $p$ | Meaning |
|---|---|---|
| 0 | 0.00 | Certain non-hypoxic |
| 64 | 0.25 | Low probability hypoxic |
| 127 | 0.50 | Maximum uncertainty (decision boundary) |
| 191 | 0.75 | High probability hypoxic |
| 255 | 1.00 | Certain hypoxic |
Code:
# Scale probabilities to uint8 for storage
probabilities_uint8 = (probabilities_2d * 255).astype(np.uint8)
# Save uncertainty raster
with rasterio.open(output_file, 'w', dtype='uint8', ...) as dst:
dst.write(probabilities_uint8, 1)
5. Uncertainty Sources & Interpretation¶
5.1 Aleatoric Uncertainty (Data Noise)¶
Irreducible uncertainty from measurement noise, natural variability, or ambiguous class boundaries:
- Satellite sensor noise: chlorophyll-a retrieval errors (~30% accuracy)
- Temporal mismatch: MODIS snapshot vs. DO measurement time lag
- Spatial heterogeneity: 1 km pixel may contain hypoxic + non-hypoxic patches
Quantification: High entropy at true boundaries between hypoxic/non-hypoxic waters.
5.2 Epistemic Uncertainty (Model Ignorance)¶
Reducible uncertainty from insufficient training data or model capacity:
- Feature space gaps: Test pixels with $\mathbf{x}$ values outside training range
- Class imbalance: Minority class (hypoxia) underrepresented in training
- Model simplicity: Linear boundaries in complex nonlinear systems
Quantification: Ensemble disagreement (tree vote spread in Random Forest).
5.3 Spatial Uncertainty Patterns¶
Expected high-uncertainty zones:
- Hypoxia edges: Gradients between hypoxic core and ambient waters
- Cloud-contaminated pixels: Missing/interpolated satellite data
- Shallow waters: Bottom reflectance interferes with ocean color signal
- River plumes: Turbidity violates open-ocean bio-optical assumptions
Code to detect high-uncertainty pixels:
# Pixels near decision boundary (probability ≈ 0.5)
uncertain_mask = (probabilities > 0.4) & (probabilities < 0.6)
# Pixels requiring ground-truth validation
validation_priority = entropy > 0.8 # High entropy
6. Decision Thresholds & Risk Management¶
Default Classification Rule¶
$$ \hat{y} = \begin{cases} 1 \text{ (hypoxic)} & \text{if } P(Y=1 | \mathbf{x}) \geq 0.5 \\ 0 \text{ (non-hypoxic)} & \text{if } P(Y=1 | \mathbf{x}) < 0.5 \end{cases} $$
Risk-Adjusted Thresholds¶
Adjust threshold $\tau$ based on cost asymmetry:
$$ \hat{y} = \begin{cases} 1 & \text{if } P(Y=1 | \mathbf{x}) \geq \tau \\ 0 & \text{if } P(Y=1 | \mathbf{x}) < \tau \end{cases} $$
Examples:
- Conservative (aquaculture): $\tau = 0.3$ (flag hypoxia early, minimize false negatives)
- Balanced (research): $\tau = 0.5$ (standard threshold)
- Stringent (public advisories): $\tau = 0.7$ (reduce false alarms, require high confidence)
Code:
# Apply custom threshold
predictions_conservative = (probabilities >= 0.3).astype(int)
predictions_stringent = (probabilities >= 0.7).astype(int)
7. Validation Metrics for Uncertainty¶
Brier Score (Probability Calibration)¶
$$ \text{BS} = \frac{1}{N} \sum_{i=1}^{N} (p_i - y_i)^2 $$
where:
- $p_i$ = predicted probability
- $y_i \in \{0, 1\}$ = true label
- Lower is better (perfect calibration = 0)
Code:
from sklearn.metrics import brier_score_loss
brier = brier_score_loss(y_test, probabilities)
print(f'Brier Score: {brier:.4f}')
Reliability Diagram¶
Plot predicted probabilities vs. observed frequencies:
- Well-calibrated model: Points lie on diagonal line
- Overconfident model: Points below diagonal
- Underconfident model: Points above diagonal
Summary Table: Uncertainty Methods¶
| Method | Equation | Range | Interpretation |
|---|---|---|---|
| Raw Probability | $P(Y=1) = \frac{1}{N}\sum_i T_i$ | $[0, 1]$ | Fraction of trees voting hypoxic |
| Calibrated Probability | $P_{\text{cal}} = \sigma(A \cdot f + B)$ | $[0, 1]$ | Adjusted for empirical frequencies |
| Shannon Entropy | $H = -p\log_2 p - (1-p)\log_2(1-p)$ | $[0, 1]$ bits | Information content (0=certain, 1=uncertain) |
| Uint8 Encoding | $\text{pixel} = 255 \times p$ | $[0, 255]$ | Storage-efficient raster format |
Key Takeaway: Probabilistic outputs (predict_proba) enable risk-aware decision-making by distinguishing high-confidence predictions (core hypoxic/non-hypoxic zones) from uncertain boundaries requiring in-situ validation.
Beyond binary predictions, we can also generate probability rasters showing model confidence for each pixel.
- Use calibrated probabilities (better aligned with true event rates) or raw probabilities from best_model
- Pixel values range 0–255 (scaled from 0–1 probability): 0 = very confident NOT hypoxic, 255 = very confident hypoxic
- Uncertainty high near edges where probability ≈ 0.5; low in core/non-hypoxic regions
- Overlay binary + probability rasters for risk-aware decision-making: filter predictions by confidence threshold
# Generate uncertainty (probability) rasters alongside binary predictions
# Input raster files (MODIS GeoTIFFs with 14 bands)
rasters = glob.glob(r'C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\modis out\*.tif')
# Output directories
output_path_binary = r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\probability"
output_path_uncertainty = os.path.join(output_path_binary, "uncertainty")
os.makedirs(output_path_binary, exist_ok=True)
os.makedirs(output_path_uncertainty, exist_ok=True)
print(f'Found {len(rasters)} raster files to process for uncertainty quantification.')
# Process each raster
for raster in rasters:
print(f'\nProcessing: {os.path.basename(raster)}')
with rasterio.open(raster) as src:
# Read all bands into a numpy array (bands, height, width)
array = src.read()
# Capture spatial metadata for output
meta = src.meta
print(f' Raster shape (bands, height, width): {array.shape}')
# Stack bands into (height, width, bands) format
array_stack = np.stack(array, axis=-1)
# Reshape to 2D for prediction: (n_pixels, n_bands)
array_reshaped = array_stack.reshape(-1, array_stack.shape[-1])
# Create DataFrame with proper feature names
df_array = pd.DataFrame(
array_reshaped,
columns=['chlor_a', 'nflh', 'poc', 'sst', 'Rrs_412',
'Rrs_443', 'Rrs_469', 'Rrs_488', 'Rrs_531', 'Rrs_547', 'Rrs_555',
'Rrs_645', 'Rrs_667', 'Rrs_678']
)
# Normalize features using the same scaler used during training
df_array_normalized = scaler.transform(df_array)
# Load the best trained model
best_model_path = os.path.join(models_directory, 'best_model.pkl')
model = joblib.load(best_model_path)
# Generate binary predictions
predictions = model.predict(df_array_normalized)
# Generate probability predictions (use calibrated model if available)
try:
# Try to load and use the calibrated model for better uncertainty estimates
calibrated_model_path = os.path.join(models_directory, f'{best_model_name}_calibrated.pkl')
if os.path.exists(calibrated_model_path):
calibrator_loaded = joblib.load(calibrated_model_path)
probabilities = calibrator_loaded.predict_proba(df_array_normalized)[:, 1]
print(f' ✓ Using calibrated probabilities for uncertainty')
else:
# Fall back to raw probabilities from best model
if hasattr(model, 'predict_proba'):
probabilities = model.predict_proba(df_array_normalized)[:, 1]
else:
probabilities = model.decision_function(df_array_normalized)
# Normalize decision function to [0, 1] range
probabilities = (probabilities - probabilities.min()) / (probabilities.max() - probabilities.min() + 1e-9)
print(f' Using raw model probabilities for uncertainty')
except Exception as e:
print(f' Warning: Could not load calibrated model, using raw probabilities: {e}')
if hasattr(model, 'predict_proba'):
probabilities = model.predict_proba(df_array_normalized)[:, 1]
else:
probabilities = model.decision_function(df_array_normalized)
probabilities = (probabilities - probabilities.min()) / (probabilities.max() - probabilities.min() + 1e-9)
# Reshape predictions and probabilities back to 2D raster (height, width)
predictions_2d = predictions.reshape(array_stack.shape[0], array_stack.shape[1])
probabilities_2d = probabilities.reshape(array_stack.shape[0], array_stack.shape[1])
print(f' Predictions 2D shape: {predictions_2d.shape}')
print(f' Probabilities range: [{probabilities_2d.min():.3f}, {probabilities_2d.max():.3f}]')
# Save binary prediction raster
meta_binary = meta.copy()
meta_binary.update(count=1, dtype=rasterio.uint8, height=predictions_2d.shape[0], width=predictions_2d.shape[1])
output_filename_binary = os.path.splitext(os.path.basename(raster))[0] + '_hypoxia_pred.tif'
output_file_binary = os.path.join(output_path_binary, output_filename_binary)
with rasterio.open(output_file_binary, 'w', **meta_binary) as dst:
dst.write(predictions_2d.astype(rasterio.uint8), 1)
print(f' ✓ Saved binary prediction to: {output_file_binary}')
# Save uncertainty (probability) raster as uint8 (0-255 scale)
# 0 = certain non-hypoxic, 255 = certain hypoxic, 127 = maximum uncertainty
probabilities_uint8 = (probabilities_2d * 255).astype(rasterio.uint8)
meta_uncertainty = meta.copy()
meta_uncertainty.update(count=1, dtype=rasterio.uint8, height=probabilities_2d.shape[0], width=probabilities_2d.shape[1])
output_filename_uncertainty = os.path.splitext(os.path.basename(raster))[0] + '_hypoxia_uncertainty.tif'
output_file_uncertainty = os.path.join(output_path_uncertainty, output_filename_uncertainty)
with rasterio.open(output_file_uncertainty, 'w', **meta_uncertainty) as dst:
dst.write(probabilities_uint8, 1)
print(f' ✓ Saved uncertainty raster to: {output_file_uncertainty}')
# Compute and display uncertainty stats
entropy = -probabilities_2d * np.log2(probabilities_2d + 1e-9) - (1 - probabilities_2d) * np.log2(1 - probabilities_2d + 1e-9)
high_confidence = (probabilities_2d > 0.8) | (probabilities_2d < 0.2)
pct_confident = 100 * high_confidence.sum() / high_confidence.size
print(f' High-confidence pixels (prob > 0.8 or < 0.2): {pct_confident:.1f}%')
# Clear memory
del array, array_stack, array_reshaped, df_array, df_array_normalized, predictions, probabilities
del predictions_2d, probabilities_2d, probabilities_uint8, entropy, high_confidence
print(f'\n✓ Uncertainty quantification complete!')
print(f' Binary predictions saved to: {output_path_binary}')
print(f' Uncertainty rasters saved to: {output_path_uncertainty}')
# Visualize a single uncertainty raster (uint8 0–255 → probability 0–1)
import os
import numpy as np
import matplotlib.pyplot as plt
import rasterio
from rasterio.plot import plotting_extent
uncertainty_path = r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\probability\uncertainty\OCEANDATA_MODIS-Aqua_L3SMI2002-07-01_hypoxia_uncertainty.tif"
with rasterio.open(uncertainty_path) as src:
data_u8 = src.read(1) # uint8 values 0–255
extent = plotting_extent(src)
res = src.res
crs = src.crs
bounds = src.bounds
# Convert to probability scale (0–1)
data_prob = data_u8.astype(np.float32) / 255.0
uncertainty = 1.0 - np.minimum(np.abs(data_prob - 0.5) * 2.0, 1.0)
# Plot probability map and histogram
try:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
HAS_CARTOPY = True
except Exception:
HAS_CARTOPY = False
# Optional cartopy overlay (coastline)
if HAS_CARTOPY:
fig, ax = plt.subplots(1, 1, figsize=(10, 6), subplot_kw={'projection': ccrs.PlateCarree()})
im = ax.imshow(uncertainty, extent=extent, cmap='YlOrBr', vmin=0, vmax=1, transform=ccrs.PlateCarree())
ax.add_feature(cfeature.LAND, facecolor='lightgrey', zorder=0)
ax.add_feature(cfeature.COASTLINE, linewidth=0.8, edgecolor='black', zorder=2)
gl = ax.gridlines(draw_labels=True, linewidth=0.5, color='gray', alpha=0.5, linestyle='--')
gl.top_labels = False
gl.right_labels = False
plt.colorbar(im, ax=ax, orientation='vertical', label='Uncertainty (0–1)')
ax.set_title('Uncertainty (with coastlines)', fontsize=12)
plt.tight_layout()
plt.show()
Using uncertainty for risk filtering¶
Workflow for decision-making:
- Load binary prediction + uncertainty rasters in GIS (e.g., QGIS, ArcGIS)
- High-confidence hypoxia zones: pixels where
uncertainty > 200(>78% probability hypoxic)- These are most reliable for identifying true dead zones
- Low-confidence edges: pixels where
uncertainty 100–155(near 50% threshold)- Indicate transition zones or spatial/temporal boundary effects
- Combine with in-situ data: validate model against independent DO measurements to calibrate threshold
Output files:
*_hypoxia_pred.tif: binary (0/1) per-pixel predictions*_hypoxia_uncertainty.tif: scaled 0–255 confidence scores (uint8)
Interpretation Notes¶
- High uncertainty in boundaries: Pixels at hypoxia edges (probability ~0.5) may flip class; consider overlaying uncertainty rasters for confidence assessment
- Resolution vs. detection: 1 km pixels may miss small hypoxic pockets; stack with higher-resolution sensors for fine details
- Temporal resolution: Monthly MODIS composites smooth daily variability; gaps possible during cloud cover
- Validation: Always cross-reference satellite-derived extents with in-situ CTD or bottle samples
Part 6: Spatial Quantification & Temporal Analysis¶
Raster Reprojection & Hypoxia Area Quantification¶
Convert prediction rasters to projected coordinates and calculate time-series of hypoxia zone extents.
Workflow Overview¶
This section quantifies the spatial extent of predicted hypoxia across all satellite raster tiles, enabling temporal tracking of dead zone dynamics:
Step 1: Load All Prediction Rasters¶
- Scans output directory for binary prediction GeoTIFFs (
*_hypoxia_pred.tif) - Each raster represents predictions for a specific MODIS acquisition date
Step 2: Reproject to Projected CRS¶
- Original coordinates: Geographic (WGS84 lat/lon) — distort area calculations at regional scales
- Target CRS: EPSG:3814 (State Plane Coordinate System, Gulf of Mexico) — preserves distances and areas
- Method: Nearest-neighbor resampling to maintain class integrity (0/1 pixel values)
Step 3: Calculate Class Areas¶
For each reprojected raster:
- Count pixels in each class:
- Class 0 = Non-hypoxic (DO ≥ 2 mg/L)
- Class 1 = Hypoxic/Dead zone (DO < 2 mg/L)
- Convert pixel counts to physical area:
- Multiply by reprojected pixel size (m²) → m²
- Scale to km² (÷ 1,000,000)
Step 4: Build Time-Series Summary¶
Create DataFrame with columns:
| Field | Description |
|---|---|
file |
Date stamp from filename (e.g., 2020-06-01) |
class_0_area_sqkm |
Non-hypoxic area (km²) |
class_1_area_sqkm |
Hypoxia zone extent (km²) |
Step 5: Export & Analyze¶
- Save as
area.csvfor downstream analysis - Plot monthly/seasonal time series
- Identify peak hypoxia periods
- Assess interannual variability
Key Outputs¶
Primary result: area.csv time series tracking:
- Hypoxia zone size (km²) per satellite acquisition
- Temporal patterns: seasonal cycles, event magnitude, persistence
- Comparisons: baseline years vs. anomalous years
Use cases:
- Validate against in-situ water quality networks (e.g., GCOOS buoys)
- Assess effectiveness of nutrient management policies
- Forecast dead zone extent for management decisions
- Publish spatial-temporal dynamics in publications
from rasterio.warp import calculate_default_transform, reproject, Resampling
path=r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\Predicted rast"
#find .tif
rasters=glob.glob(os.path.join(path,'*.tif'))
# read first
with rasterio.open(rasters[0]) as src:
array = src.read(1)
meta = src.meta
resolution = src.res
# crs
crs = src.crs
# Define the target CRS
dst_crs = 'EPSG:3814'
# Calculate the transform and the new dimensions
transform, width, height = calculate_default_transform(src.crs, dst_crs, src.width, src.height, *src.bounds)
# Create a new array to hold the reprojected data
dst_array = np.empty((height, width))
# Reproject the data
reproject(
source=array,
destination=dst_array,
src_transform=src.transform,
src_crs=src.crs,
dst_transform=transform,
dst_crs=dst_crs,
resampling=Resampling.nearest)
# print array and resolution
print(f"Original array shape: {array.shape}")
print(f"Reprojected array shape: {dst_array.shape}")
print(f"Original Resolution: {resolution}")
print(f"Original CRS: {crs}")
print(f"New CRS: {dst_crs}")
# reprojected resolution
dst_resolution = (transform.a, transform.e)
print(f"Reprojected Resolution: {dst_resolution}")
# reprojected resolution
dst_resolution = (transform.a, transform.e)
print(f"Reprojected Resolution: {dst_resolution}")
# Assuming 0 and 1 are the classes in your raster
# Assuming 0 and 1 are the classes in your raster
class_0_area_sqm = np.count_nonzero(dst_array == 0) * (dst_resolution[0] * -dst_resolution[1])
class_1_area_sqm = np.count_nonzero(dst_array == 1) * (dst_resolution[0] * -dst_resolution[1])
# Convert to square kilometers
class_0_area_sqkm = class_0_area_sqm / 1e6
class_1_area_sqkm = class_1_area_sqm / 1e6
print(f"Area of class 0: {class_0_area_sqkm} square kilometers")
print(f"Area of class 1: {class_1_area_sqkm} square kilometers")
# calculate area of each file in rasters and make dataframe: file name, class 0 area, class 1 area
df_area=[]
for raster in rasters:
with rasterio.open(raster) as src:
array = src.read(1)
meta = src.meta
resolution = src.res
crs = src.crs
dst_crs = 'EPSG:3814'
transform, width, height = calculate_default_transform(src.crs, dst_crs, src.width, src.height, *src.bounds)
dst_array = np.empty((height, width))
reproject(
source=array,
destination=dst_array,
src_transform=src.transform,
src_crs=src.crs,
dst_transform=transform,
dst_crs=dst_crs,
resampling=Resampling.nearest)
dst_resolution = (transform.a, transform.e)
class_0_area_sqm = np.count_nonzero(dst_array == 0) * (dst_resolution[0] * -dst_resolution[1])
class_1_area_sqm = np.count_nonzero(dst_array == 1) * (dst_resolution[0] * -dst_resolution[1])
class_0_area_sqkm = class_0_area_sqm / 1e6
class_1_area_sqkm = class_1_area_sqm / 1e6
name=raster[-18:-8]
# make dataframe, then append to list
df_temp=pd.DataFrame({'file':[name],'class_0_area_sqkm':[class_0_area_sqkm],'class_1_area_sqkm':[class_1_area_sqkm]})
df_area.append(df_temp)
del df_temp, array, dst_array, class_0_area_sqm, class_1_area_sqm, class_0_area_sqkm, class_1_area_sqkm
# convert list to dataframe
df_area=pd.concat(df_area,ignore_index=True)
# save to csv
df_area.to_csv(os.path.join(path,'area.csv'),index=False)
df_area
Original array shape: (850, 1936) Reprojected array shape: (976, 1960) Original Resolution: (0.008983152841195214, 0.008983152841195214) Original CRS: EPSG:4326 New CRS: EPSG:3814 Reprojected Resolution: (906.05699435686, -906.05699435686) Reprojected Resolution: (906.05699435686, -906.05699435686) Area of class 0: 1461363.0373606663 square kilometers Area of class 1: 109060.96201322679 square kilometers
| file | class_0_area_sqkm | class_1_area_sqkm | |
|---|---|---|---|
| 0 | 2002-07-01 | 1.461363e+06 | 109060.962013 |
| 1 | 2002-08-01 | 1.487167e+06 | 83257.198658 |
| 2 | 2002-09-01 | 1.524132e+06 | 46291.944892 |
| 3 | 2002-10-01 | 1.511844e+06 | 58579.763991 |
| 4 | 2002-11-01 | 1.525554e+06 | 44870.078064 |
| ... | ... | ... | ... |
| 323 | 2009-03-01 | 1.536844e+06 | 33579.700187 |
| 324 | 2009-04-01 | 1.538245e+06 | 32179.177781 |
| 325 | 2009-05-01 | 1.537058e+06 | 33366.255975 |
| 326 | 2009-06-01 | 1.506612e+06 | 63812.430942 |
| 327 | 2009-07-01 | 1.441411e+06 | 129013.070202 |
328 rows × 3 columns
Part 7: Spatial-Temporal Trend Analysis & Visualization¶
Multi-Scale Temporal Dynamics of Predicted Hypoxia¶
This section synthesizes spatial quantification results into publication-ready visualizations that reveal temporal patterns in hypoxia zone extent. Through monthly, seasonal, and annual analyses, we identify:
- Seasonal Cycles: When does hypoxia peak? (typically summer in Gulf of Mexico)
- Interannual Variability: How much does the dead zone size vary year-to-year?
- Extreme Events: Are there anomalous years or hurricane-driven changes?
- Trend Analysis: Is the long-term trajectory increasing or decreasing?
Workflow Overview¶
- Load Area Time-Series: Read
area.csvcontaining hypoxia zone area (km²) per satellite acquisition date - Extract Temporal Features: Parse dates; group by month, season, and year
- Generate Multi-Panel Figures:
- (a) Monthly Boxplot: Shows distribution per calendar month, revealing seasonal phase
- (b) Seasonal Violin Plot: Aggregates 3-month seasons with individual observations for pattern robustness
- (c) Time Series with Trend: Full temporal trajectory with regression line, mean baseline, and optional hurricane markers
- Statistical Summary: Compute descriptive statistics (mean, std, min, max) per month/season
- Cross-Validation with In-Situ: Compare satellite-derived extents against independent water quality buoys (if available)
Key Outputs¶
- Publication-ready figures with professional styling (fonts, colors, annotations)
- Numerical summaries of seasonal/annual hypoxia magnitude
- Anomaly detection: Years or months deviating significantly from baseline
- Trend confidence intervals: Bootstrap-based uncertainty on slope estimates
Annual Averages of Predicted Hypoxia Rasters¶
Compute per-year averages of predicted hypoxia rasters (2011–2020), stack all scenes for each year, ignore nodata/zero values, and visualize the mean intensity per pixel. The plot uses a shared color scale across years and saves a high-resolution figure for quick annual comparisons.
import matplotlib.pyplot as plt
import rasterio
import rasterio.plot
import numpy as np
import glob
import os
path = r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\Predicted rast"
saved_file=r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\averagedPredicted"
fig, axs = plt.subplots(5, 2, figsize=(15, 30), constrained_layout=True) # Create a figure with 10 subplots
axs = axs.ravel() # Flatten the array of axes
# Define a variable to store the maximum intensity of the hypoxic zone across all plots
max_intensity = 0
for i, year in enumerate(range(2011, 2021)):
# Select only files containing the current year
rasters = glob.glob(os.path.join(path, f'*{year}*.tif'))
# Read all files and create an array stack
array_stack = []
for raster in rasters:
with rasterio.open(raster) as src:
array = src.read(1)
array_stack.append(array)
# Stack arrays along the first axis
array_stack = np.stack(array_stack, axis=0)
# Calculate the average
array_avg = np.nanmean(array_stack, axis=0) # Use np.nanmean to ignore NaN values
array_avg[array_avg == 0] = np.nan # Remove 0 values
# Extract geospatial information
with rasterio.open(rasters[0]) as src:
transform = src.transform
crs = src.crs
# Plot on the current subplot
im = axs[i].imshow(array_avg, extent=rasterio.plot.plotting_extent(src), cmap='viridis')
axs[i].set_xlabel('Longitude')
axs[i].set_ylabel('Latitude')
axs[i].set_title(f'Average of predicted hypoxic area ({year})', fontsize=14, fontweight='bold')
# Update max_intensity if necessary
max_intensity = max(max_intensity, np.nanmax(array_avg))
# Add a colorbar to the figure
cbar = fig.colorbar(im, ax=axs, orientation='horizontal', pad=0.02)
cbar.set_label('Intensity of Hypoxic Zone')
# save fig
#plt.savefig(r'D:\my works\bloom\figure\average_raster_data.jpg',dpi=600,bbox_inches='tight')
plt.show()
# clean memory
del array_stack, array_avg, transform, crs, im, cbar
Publication ready figures¶
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from mpl_toolkits.axes_grid1 import make_axes_locatable
path = r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\Predicted rast"
saved_file = r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\averagedPredicted"
os.makedirs(saved_file, exist_ok=True)
fig, axs = plt.subplots(4, 3, figsize=(18, 12),
gridspec_kw={'wspace': 0.05, 'hspace': 0.05},
subplot_kw={'projection': ccrs.PlateCarree()})
axs = axs.ravel()
years = list(range(2011, 2021))[:12] # 12 years for 3x4 grid
im = None # Initialize im for colorbar
for i, year in enumerate(years):
rasters = glob.glob(os.path.join(path, f'*{year}*.tif'))
if not rasters:
axs[i].axis('off')
continue
array_stack = []
for raster in rasters:
with rasterio.open(raster) as src:
array = src.read(1)
array_stack.append(array)
array_stack = np.stack(array_stack, axis=0)
array_avg = np.nanmean(array_stack, axis=0)
array_avg[array_avg == 0] = np.nan
with rasterio.open(rasters[0]) as src:
extent = rasterio.plot.plotting_extent(src)
im = axs[i].imshow(array_avg, extent=extent, cmap='jet', transform=ccrs.PlateCarree())
#axs[i].add_feature(cfeature.COASTLINE, linewidth=0.8, edgecolor='black', zorder=2)
axs[i].add_feature(cfeature.LAND, facecolor='lightgrey', zorder=0)
axs[i].annotate(f'({chr(97+i)})', xy=(0.01, 0.97), xycoords='axes fraction',
fontsize=14, fontweight='bold', ha='left', va='top')
# Gridlines: only left and bottom, no right/top labels
gl = axs[i].gridlines(draw_labels=True, linewidth=0.5, color='gray',
alpha=0.5, linestyle='--', zorder=3)
gl.top_labels = False
gl.right_labels = False
gl.xlabel_style = {'size': 10}
gl.ylabel_style = {'size': 10}
gl.left_labels = (i % 3 == 0)
gl.bottom_labels = (i // 3 == 3)
if not gl.left_labels:
axs[i].set_yticks([])
if not gl.bottom_labels:
axs[i].set_xticks([])
# Hide any unused subplots
for j in range(len(years), len(axs)):
axs[j].axis('off')
# Place colorbar in the last two empty subplots (axs[9] and axs[10])
from matplotlib.gridspec import GridSpec
gs = axs[0].get_gridspec()
# Remove the last two axes to make space for colorbar
# Create a new axis spanning the last row, columns 1 and 2
cax = fig.add_subplot(gs[3, 1:3])
cbar = fig.colorbar(im, cax=cax, orientation='horizontal', aspect=30)
cbar.set_label('Intensity of Hypoxic Zone', fontsize=14, fontweight='bold')
cax.set_aspect(0.08) # Lower value = thinner
# save fig
plt.savefig(os.path.join(saved_file, 'average_predicted_hypoxic_2011_2020.jpg'), dpi=600, bbox_inches='tight')
plt.show()
del array_stack, array_avg, im, cbar
# caption: Average predicted hypoxic zone intensity from 2011 to 2020:Here
Model Selection & Prediction Guidance¶
Base RF vs Tuned RF vs Calibrated¶
- Base Random Forest: Uses sensible defaults; fast baseline; good starting point but not tailored to this dataset’s class imbalance and feature patterns.
- Hyperparameter‑Tuned RF: Optimized via
RandomizedSearchCV(trees, depth, splits/leaves, features) for higher F1 with stratified CV; typically best for 0/1 classification performance. - Calibrated Model: Wraps the trained classifier with probability calibration (e.g., sigmoid/Platt) so scores better reflect true event rates; improves confidence thresholds and risk filtering.
Which is best for prediction?¶
- Binary raster predictions (0/1): Use the F1‑best tuned model — saved as
best_model.pkl. - Uncertainty/probability rasters: Prefer a calibrated version (e.g.,
best_model_calibrated.pkl); notebook falls back to raw probabilities when calibration is not available.
How the notebook wires this¶
- The training loop ranks models by F1 and writes the canonical winner to
best_model.pklfor downstream inference. - The uncertainty pipeline loads
best_model.pkland, a calibrated variant for probability maps; features are normalized with the savedMinMaxScaler.
Recommended next step¶
- Fit and save a calibrated wrapper for the final best model using
CalibratedClassifierCV(sigmoid, cv=3)to improve probability reliability for decision thresholds.
Ethical Considerations for Model Deployment¶
As we move from model development to real-world deployment, we must address critical ethical and practical concerns that arise when predictive models affect livelihoods, environmental decisions, and community well-being.
1. What accuracy is "good enough" when livelihoods are affected?¶
Our tuned Random Forest achieves 91.5% overall accuracy with F1 score of 0.918 and 92% recall for hypoxia detection. While these metrics appear strong, "good enough" depends on the cost of errors:
False Negatives (8%): Missing hypoxia events can harm fisheries, aquaculture, and ecosystem health. If we fail to warn about oxygen depletion, commercial fishers may waste fuel searching unproductive areas, oyster farms may suffer mass mortality, and managers may miss intervention opportunities.
False Positives (~10%): Incorrectly flagging healthy waters as hypoxic can trigger unnecessary closures, economic losses, and "cry wolf" effects that erode trust. Repeated false alarms may lead stakeholders to ignore future warnings.
Context-dependent thresholds: Rather than a single accuracy standard, we need decision thresholds calibrated to stakeholder needs. Aquaculture operators might prefer high recall (catch all hypoxia events, accept more false alarms) while recreational fishing advisories might balance precision-recall to maintain credibility. The calibrated model with probability outputs enables users to adjust thresholds based on their risk tolerance.
2. How do we communicate uncertainty to non-experts?¶
Our model produces probability maps, not binary certainties. Effective communication strategies include:
Visual uncertainty layers: Display predictions with color-coded confidence intervals (e.g., "High confidence hypoxia: red, Moderate uncertainty: yellow, Low confidence: gray"). Our uncertainty rasters show where satellite coverage or feature values lie outside training ranges.
Plain-language reports: Translate ROC-AUC (0.96) into actionable statements: "In 9 out of 10 cases, the model correctly distinguishes hypoxic from healthy waters." Avoid jargon like "calibration curves" or "precision-recall tradeoffs."
Scenario comparisons: Show multiple model outputs (base, tuned, calibrated) with explanations: "The tuned model prioritizes catching hypoxia events; the calibrated model provides more reliable risk percentages for decision-making."
Historical validation: Present backtesting results on past years: "When we tested 2019 predictions against actual observations, the model correctly identified 92% of hypoxia zones." Concrete examples build trust more than abstract metrics.
Limitations front-and-center: Explicitly state when the model is unreliable (e.g., "Predictions degrade during heavy cloud cover or near-shore turbidity; use with caution in estuaries").
3. Who has access to predictions (equity)?¶
Predictive tools can exacerbate inequalities if access is restricted:
Data justice: Small-scale fishers, indigenous communities, and subsistence harvesters often lack internet connectivity, technical literacy, or resources to access sophisticated dashboards. We must provide multiple access tiers:
- Web dashboards: For resource managers and researchers with broadband.
- SMS/text alerts: Low-tech notifications for communities with basic phones.
- Community meetings: In-person briefings where models are co-interpreted with local knowledge.
Open data & transparency: Publish predictions, training data, and model code openly (GitHub, public APIs) so independent scientists can validate results and communities can audit decisions affecting them.
Language & literacy: Translate outputs into local languages; use visual formats (maps, icons) instead of tables for audiences with low literacy.
Free-to-use licensing: Avoid paywalls or proprietary restrictions that exclude under-resourced users.
Our commitment: This notebook uses open-source tools (scikit-learn, MODIS data from NASA's free archive) and produces outputs compatible with free GIS software (QGIS) to maximize accessibility.
4. What happens when models are wrong?¶
Despite high accuracy, errors are inevitable. Responsible deployment requires accountability mechanisms:
Error reporting systems: Create channels (hotlines, apps) where fishers and managers can report discrepancies between predictions and observations. Feedback loops improve future model versions.
Graceful degradation: When satellite data is missing or model confidence is low, default to precautionary warnings rather than silence. Display messages like: "Insufficient data for high-confidence prediction; exercise caution."
Insurance & liability: Clarify that models are decision-support tools, not guarantees. Users remain responsible for ground-truthing and integrating predictions with local expertise. Avoid legal language implying infallibility.
Post-event analysis: After major hypoxia events, conduct forensic reviews: Did the model miss it? Why? Were thresholds too strict? Document lessons learned publicly to build credibility.
Hybrid approaches: Combine model outputs with in-situ sensors (dissolved oxygen buoys, citizen science observations) for real-time validation. Models guide where to deploy limited monitoring resources.
Example: If our model misses a hypoxia event (false negative), we investigate whether new oceanographic conditions (e.g., sudden stratification from hurricanes) fell outside training data patterns, then retrain with updated examples.
5. How do we maintain trust and transparency?¶
Trust erodes quickly if models are perceived as "black boxes" or tools for control rather than empowerment:
Explainable AI: Provide feature importance plots (e.g., "chlorophyll-a and SST were the strongest indicators this week") so users understand why a prediction was made. Our feature importance analysis shows satellite bands driving decisions.
Model cards: Publish standardized documentation (model cards) listing:
- Training data sources & timeframes (GCOOS 1970-2020, MODIS 2011-2020)
- Performance metrics by subgroup (deep vs. shallow water, summer vs. winter)
- Known failure modes (cloud-contaminated pixels, near-river plumes)
- Update frequency & versioning
Co-design with stakeholders: Involve fishers, aquaculture operators, and indigenous groups in defining prediction thresholds and dashboard features. Participatory design ensures tools meet real needs rather than academic priorities.
Independent audits: Invite third-party scientists or watchdog groups to validate model performance and check for biases (e.g., Do predictions favor wealthy commercial zones over subsistence areas?).
Version control & reproducibility: Archive model versions with DOIs; ensure anyone can replicate results. This notebook's use of saved model files (.pkl) and documented random seeds supports reproducibility.
Our approach: By publishing this full analysis pipeline (data extraction → training → validation → uncertainty quantification), we enable peer review and community scrutiny. Transparency is the foundation of trustworthy AI in environmental science.
Key Takeaway: Deploying hypoxia models is not just a technical challenge—it's a social contract. We must balance predictive power with humility about limitations, prioritize equity in access, and build feedback systems that center affected communities. High accuracy matters, but trustworthiness, accountability, and justice determine whether models genuinely serve the public good.
Next Directions¶
Part 8: Model Deployment & Scaling¶
Overview¶
This section provides a roadmap for deploying the hypoxia prediction model to production environments, creating interactive dashboards for stakeholder engagement, and scaling the pipeline for operational monitoring. We cover API development, dashboard frameworks, cloud deployment strategies, and recommendations for real-time prediction systems.
8.1 Deployment Architecture Options¶
Option 1: REST API Service (Flask/FastAPI)
- Expose model as HTTP endpoint for on-demand predictions
- Suitable for: Web applications, mobile apps, third-party integrations
- Pros: Language-agnostic, scalable with load balancers
- Cons: Network latency, requires server infrastructure
Option 2: Batch Processing Pipeline
- Schedule automated predictions on new satellite data (e.g., weekly/monthly)
- Suitable for: Routine monitoring, large-scale raster processing
- Pros: Efficient for bulk operations, can run offline
- Cons: Not real-time, requires orchestration (Airflow, Kubernetes CronJobs)
Option 3: Serverless Functions (AWS Lambda, Azure Functions)
- Deploy model as serverless function triggered by data uploads
- Suitable for: Event-driven workflows, cost-sensitive applications
- Pros: Auto-scaling, pay-per-use pricing
- Cons: Cold start latency, execution time limits
Option 4: Desktop/Edge Application
- Package model with GUI (Tkinter, PyQt) or CLI tool
- Suitable for: Field researchers, offline environments
- Pros: No internet dependency, full data control
- Cons: Limited to single-user, manual updates
8.2 Interactive Dashboard with Raster Maps¶
Create a web-based dashboard to visualize hypoxia predictions with uncertainty overlays. Users can explore temporal trends, compare scenarios, and export reports for decision-making.
Recommended Frameworks:
- Plotly Dash: Python-native, reactive components, good for scientific dashboards
- Streamlit: Rapid prototyping, minimal code, excellent for data science demos
- Panel (HoloViz): Jupyter-compatible, supports geospatial widgets (GeoViews, hvPlot)
- Leaflet + Flask: Custom JavaScript frontend with Python backend for advanced interactivity
Dashboard Components:
- Raster Map Viewer: Display prediction + uncertainty layers with zoom/pan controls
- Temporal Slider: Animate time-series of hypoxia extent (2011-2020)
- Metrics Panel: Show area statistics, confidence intervals, model performance
- Threshold Controls: Adjust uncertainty cutoffs to filter high-confidence zones
- Export Tools: Download GeoTIFFs, PNGs, CSVs for custom analysis
# Example: Simple Dashboard Prototype with Plotly Dash
# Install: pip install dash plotly rasterio
import dash
from dash import dcc, html, Input, Output
import plotly.graph_objects as go
import rasterio
import numpy as np
import glob
import os
# Initialize Dash app
app = dash.Dash(__name__)
# Load prediction rasters
raster_dir = r"C:\Users\hafez\MSU\Research\msGOM\mssound\bloom\data\Predicted rast"
pred_files = sorted(glob.glob(os.path.join(raster_dir, "*_hypoxia_pred.tif")))
uncertainty_files = sorted(glob.glob(os.path.join(raster_dir, "*_hypoxia_uncertainty.tif")))
# Extract dates from filenames for dropdown
dates = [os.path.basename(f)[-18:-8] for f in pred_files]
# Dashboard layout
app.layout = html.Div([
html.H1("Gulf of Mexico Hypoxia Prediction Dashboard", style={'textAlign': 'center'}),
html.Div([
html.Label("Select Date:"),
dcc.Dropdown(
id='date-dropdown',
options=[{'label': date, 'value': i} for i, date in enumerate(dates)],
value=0,
style={'width': '300px'}
),
], style={'padding': '20px'}),
html.Div([
html.Div([
html.H3("Hypoxia Prediction (Binary)"),
dcc.Graph(id='prediction-map')
], style={'width': '48%', 'display': 'inline-block'}),
html.Div([
html.H3("Prediction Uncertainty (Confidence)"),
dcc.Graph(id='uncertainty-map')
], style={'width': '48%', 'display': 'inline-block', 'float': 'right'})
]),
html.Div([
html.H3("Statistics"),
html.Div(id='stats-panel', style={'fontSize': '16px', 'padding': '20px'})
])
])
# Callback to update maps and statistics
@app.callback(
[Output('prediction-map', 'figure'),
Output('uncertainty-map', 'figure'),
Output('stats-panel', 'children')],
[Input('date-dropdown', 'value')]
)
def update_dashboard(selected_idx):
# Load prediction raster
with rasterio.open(pred_files[selected_idx]) as src:
pred_data = src.read(1)
bounds = src.bounds
# Load uncertainty raster
with rasterio.open(uncertainty_files[selected_idx]) as src:
uncert_data = src.read(1).astype(float)
# Create prediction heatmap
pred_fig = go.Figure(data=go.Heatmap(
z=pred_data,
colorscale=[[0, 'lightblue'], [1, 'red']],
colorbar=dict(title="Class", tickvals=[0, 1], ticktext=["Non-hypoxic", "Hypoxic"])
))
pred_fig.update_layout(
xaxis_title="Longitude",
yaxis_title="Latitude",
height=500
)
# Create uncertainty heatmap
uncert_fig = go.Figure(data=go.Heatmap(
z=uncert_data,
colorscale='Viridis',
colorbar=dict(title="Confidence (0-255)")
))
uncert_fig.update_layout(
xaxis_title="Longitude",
yaxis_title="Latitude",
height=500
)
# Compute statistics
total_pixels = np.sum(~np.isnan(pred_data))
hypoxic_pixels = np.sum(pred_data == 1)
hypoxic_pct = (hypoxic_pixels / total_pixels * 100) if total_pixels > 0 else 0
avg_uncertainty = np.nanmean(uncert_data)
high_conf_pixels = np.sum((uncert_data > 200) | (uncert_data < 55))
high_conf_pct = (high_conf_pixels / total_pixels * 100) if total_pixels > 0 else 0
stats_text = [
html.P(f"📅 Date: {dates[selected_idx]}"),
html.P(f"🌊 Hypoxic Area: {hypoxic_pct:.1f}% of pixels ({hypoxic_pixels:,} pixels)"),
html.P(f"📊 Average Uncertainty: {avg_uncertainty:.1f} (0=low, 255=high)"),
html.P(f"✅ High Confidence Pixels: {high_conf_pct:.1f}% (>{200} or <{55})")
]
return pred_fig, uncert_fig, stats_text
# Note: To run dashboard, execute: app.run_server(debug=True, port=8050)
# Then open browser to http://localhost:8050
print("Dashboard code ready. To launch, uncomment: app.run_server(debug=True, port=8050)")
8.3 REST API for Model Serving¶
Package the model as a RESTful API to enable integration with web applications, mobile apps, and third-party systems. Below is a FastAPI implementation for real-time hypoxia predictions.
# Example: FastAPI Model Serving Endpoint
# Save as api.py and run: uvicorn api:app --reload
# Install: pip install fastapi uvicorn pydantic joblib scikit-learn
"""
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np
import pandas as pd
from typing import List
app = FastAPI(title="Hypoxia Prediction API", version="1.0")
# Load model and scaler at startup
model = joblib.load('models/best_model.pkl')
scaler = joblib.load('models/scaler.pkl')
# Define input schema
class PredictionInput(BaseModel):
chlor_a: float
nflh: float
poc: float
sst: float
Rrs_412: float
Rrs_443: float
Rrs_469: float
Rrs_488: float
Rrs_531: float
Rrs_547: float
Rrs_555: float
Rrs_645: float
Rrs_667: float
Rrs_678: float
class BatchPredictionInput(BaseModel):
features: List[PredictionInput]
# Health check endpoint
@app.get("/")
def health_check():
return {"status": "healthy", "model": "best_model.pkl"}
# Single prediction endpoint
@app.post("/predict")
def predict(input_data: PredictionInput):
try:
# Convert input to DataFrame
features_df = pd.DataFrame([input_data.dict()])
# Scale features
features_scaled = scaler.transform(features_df)
# Make prediction
prediction = model.predict(features_scaled)[0]
probability = model.predict_proba(features_scaled)[0, 1]
return {
"prediction": int(prediction),
"probability_hypoxic": float(probability),
"class_label": "Hypoxic" if prediction == 1 else "Non-hypoxic"
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Batch prediction endpoint
@app.post("/predict_batch")
def predict_batch(batch_input: BatchPredictionInput):
try:
# Convert batch to DataFrame
features_df = pd.DataFrame([item.dict() for item in batch_input.features])
# Scale features
features_scaled = scaler.transform(features_df)
# Make predictions
predictions = model.predict(features_scaled)
probabilities = model.predict_proba(features_scaled)[:, 1]
results = [
{
"prediction": int(pred),
"probability_hypoxic": float(prob),
"class_label": "Hypoxic" if pred == 1 else "Non-hypoxic"
}
for pred, prob in zip(predictions, probabilities)
]
return {"predictions": results, "count": len(results)}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Model metadata endpoint
@app.get("/model_info")
def model_info():
return {
"model_type": str(type(model).__name__),
"input_features": 14,
"feature_names": [
"chlor_a", "nflh", "poc", "sst",
"Rrs_412", "Rrs_443", "Rrs_469", "Rrs_488",
"Rrs_531", "Rrs_547", "Rrs_555", "Rrs_645",
"Rrs_667", "Rrs_678"
],
"output_classes": ["Non-hypoxic", "Hypoxic"]
}
"""
print("FastAPI code ready. Save to api.py and run: uvicorn api:app --reload")
print("API will be available at: http://localhost:8000")
print("Interactive docs: http://localhost:8000/docs")
8.4 Cloud Deployment Strategies¶
AWS Deployment:
- Model Storage: S3 bucket for model artifacts (best_model.pkl, scaler.pkl)
- Compute: EC2 instance or ECS container running FastAPI/Flask
- Auto-scaling: Elastic Load Balancer + Auto Scaling Groups for variable traffic
- Batch Processing: AWS Batch + Lambda for scheduled raster predictions
- Monitoring: CloudWatch for logs, metrics, and alarms
Azure Deployment:
- Model Registry: Azure Machine Learning workspace for versioned models
- Compute: Azure Container Instances or App Service for API hosting
- Batch: Azure Batch for large-scale raster processing
- Storage: Azure Blob Storage for rasters and outputs
- Monitoring: Application Insights for performance tracking
Google Cloud Deployment:
- Model Hosting: AI Platform Prediction (Vertex AI) for managed endpoints
- Storage: Cloud Storage for rasters and model artifacts
- Batch Processing: Cloud Dataflow or Cloud Run jobs
- Orchestration: Cloud Composer (Airflow) for scheduled workflows
- Earth Engine Integration: Direct access to MODIS data via GEE Python API
Docker Containerization:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY models/ ./models/
COPY api.py .
EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
Build: docker build -t hypoxia-api .
Run: docker run -p 8000:8000 hypoxia-api
8.5 Scaling Considerations for Operational Monitoring¶
Data Pipeline Optimization:
- Incremental Processing: Only process new/updated rasters (avoid recomputing all historical data)
- Parallel Processing: Use Dask or multiprocessing for multi-raster batch predictions
- Caching: Store preprocessed features (normalized arrays) to skip redundant scaling
- Tiling Strategy: Split large rasters into tiles for memory-efficient processing
Model Performance:
- Model Compression: Use ONNX or TensorFlow Lite for faster inference (50-100x speedup)
- Quantization: Convert float32 weights to int8 (4x memory reduction, 2-4x speed gain)
- Batch Inference: Process multiple pixels simultaneously (vectorized operations)
- GPU Acceleration: Use CUDA-enabled models for large raster processing (CuPy, PyTorch)
Monitoring & Maintenance:
- Prediction Drift: Track feature distributions over time; retrain if statistical shift detected
- Performance Metrics: Log latency (p50, p95, p99), throughput (predictions/sec), error rates
- Model Versioning: Use MLflow or DVC for experiment tracking and rollback capability
- Alerting: Set thresholds for anomalies (e.g., >20% high-uncertainty pixels, >50% hypoxic area)
Cost Optimization:
- Spot Instances: Use AWS Spot/Azure Low-Priority VMs for batch jobs (70% cost savings)
- Serverless: AWS Lambda for infrequent predictions (pay-per-invocation)
- Data Compression: Store rasters as COG (Cloud-Optimized GeoTIFF) with LZW compression
- Tiered Storage: Archive old predictions to S3 Glacier/Azure Cool Blob (90% cheaper)
8.6 Recommendations & Next Steps¶
Immediate Actions (Weeks 1-4)¶
- ✅ Validate Model on Independent Data: Test on 2021-2025 GCOOS buoy measurements not used in training
- ✅ Deploy Dashboard Prototype: Launch Streamlit/Dash dashboard for stakeholder demos
- ✅ Document API Specifications: Write OpenAPI/Swagger docs for /predict endpoints
- ✅ Containerize Application: Create Docker image with model + dependencies for reproducibility
Short-Term Enhancements (Months 1-3)¶
- ✅ Ensemble Models: Combine Random Forest + XGBoost predictions via voting/stacking for robustness
- ✅ Feature Engineering: Add derived features (chlor_a/sst ratio, temporal gradients, wind stress)
- ✅ Explainability Dashboard: Integrate SHAP force plots in UI to explain individual predictions
- ✅ Automated Retraining: Set up monthly model refresh pipeline with new satellite data
Medium-Term Goals (Months 3-6)¶
- ✅ Real-Time Predictions: Integrate with Google Earth Engine for daily/weekly forecasts
- ✅ Multi-Region Expansion: Extend model to Chesapeake Bay, Baltic Sea, East China Sea
- ✅ Alert System: Email/SMS notifications when hypoxia extent exceeds historical thresholds
- ✅ Mobile Application: Develop iOS/Android app for field researchers (offline mode)
Long-Term Vision (6-12 Months)¶
- ✅ Deep Learning Integration: Test LSTM/Conv-LSTM for spatiotemporal prediction (3-7 day forecasts)
- ✅ Data Fusion: Incorporate river discharge, wind speed, bathymetry, and ocean currents
- ✅ Operational Partnership: Collaborate with NOAA, EPA, or regional ocean observing systems
- ✅ Climate Scenarios: Model hypoxia under IPCC warming scenarios (RCP 4.5, 8.5)
Best Practices for Production Systems¶
- Version Control: Git for code, DVC/MLflow for data + model versioning
- Testing: Unit tests (pytest) for preprocessing, integration tests for API endpoints
- CI/CD: GitHub Actions or Jenkins for automated testing + deployment
- Documentation: Maintain README, API docs, and model cards (data provenance, limitations)
- Security: Implement API authentication (OAuth2, JWT), encrypt sensitive data at rest/in transit
- Accessibility: Follow WCAG 2.1 guidelines for dashboard UI (screen reader support, colorblind-safe palettes)
Key Performance Indicators (KPIs)¶
| Metric | Target | Current | Notes |
|---|---|---|---|
| Model F1 Score | >0.90 | 0.92 | ✅ Exceeds target |
| API Latency (p95) | <500ms | TBD | Measure after deployment |
| Dashboard Load Time | <3s | TBD | Test with 10+ rasters |
| Prediction Coverage | >90% pixels | TBD | Check cloud/nodata gaps |
| User Engagement | 50 monthly users | 0 | Post-launch metric |
Risk Mitigation¶
- Model Drift: Quarterly retraining with updated data; A/B test new models before full rollback
- Data Unavailability: Fallback to climatology predictions if satellite data missing >7 days
- Infrastructure Failure: Multi-region deployment with automated failover (99.9% uptime SLA)
- Privacy Concerns: Anonymize user queries; avoid logging sensitive location data
Resources & References¶
- Model Deployment: TensorFlow Serving, MLflow
- Dashboards: Plotly Dash, Streamlit
- Geospatial: Rasterio Docs, GDAL API
- Cloud Platforms: AWS ML, Azure ML, Google Vertex AI
- Hypoxia Research: NOAA NGOMEX, EPA Nutrient Criteria
🎯 Conclusion
This notebook provides a complete end-to-end pipeline for hypoxia prediction: from data integration and model training to spatial inference and temporal analysis. By following the deployment recommendations in Part 8, you can transition this research prototype into an operational monitoring system that supports environmental management decisions, public outreach, and scientific research.
Next steps: Run the dashboard code (Section 8.2), test the API endpoints (Section 8.3), and begin stakeholder engagement for feedback on visualization needs and alert thresholds.
Contact for deployment support: ha626@msstate.edu
Last updated: January 9, 2026
Notebook version: 1.0
Model version: best_model.pkl (Random Forest, F1=0.92)